the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Drivers of global irrigation expansion: the role of discrete global grid choice
Abstract. Global statistical irrigation modeling relies on geospatial data and traditionally adopts a discrete global grid based on longitude-latitude reference. However, this system introduces area distortion, which may lead to biased results. We propose using the ISEA3H geodesic grid based on hexagonal cells, enabling efficient and distortion-free representation of spherical data. To understand the impact of discrete global grid choice, we employ a non-parametric statistical framework, utilizing random forest methods, to identify main drivers of historical global irrigation expansion amongst others, also using outputs from the global dynamic vegetation model LPJmL.
Irrigation is critical for food security amidst growing population, changing consumption patterns, and climate change. It significantly boosts crop yields but also alters the natural water cycle and global water resources. Understanding past irrigation expansion and its drivers is vital for global change research, resource assessment, and predicting future trends.
We compare the predictive accuracy, the simulated irrigation patterns and identification of irrigation drivers between the two grid choices. Results demonstrate that using the ISEA3H geodesic grid increases the predictive accuracy by 29 % compared to the longitude-latitude grid. The model identifies population density, potential productivity increase, evaporation, precipitation, and water discharge as key drivers of historical global irrigation expansion. GDP per capita also shows minimal influence.
We conclude that the geodesic discrete global grid significantly affects predicted irrigation patterns and identification of drivers, and thus has the potential to enhance statistical modeling, which warrants further exploration in future research across related fields. This analysis lays the foundation for comprehending historical global irrigation expansion.
- Preprint
(2663 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on hess-2023-273', Anonymous Referee #1, 20 Dec 2023
This manuscript is dedicated to exploring the use of hexagons to represent irrigation maps and identify irrigation drivers, which is interesting. However, this study has significant flaws. First, area deformation is more severe at high latitudes, but the global irrigated area is more distributed in the mid-latitudes and this effect would not be significant. Second, I think there are currently feasible ways to correct for the effect of latitudinal differences on area when counting global land averages. Third, the hexagon used is too large, too low resolution, and contains a large error. It is difficult to compare it directly with higher resolution grid data. Fourth, the importance of the random forest variables is strongly dependent on the data resolution, and the importance of variables with low resolution may be underestimated. Fifth, the sample sizes are an order of magnitude different when comparing precision in scatterplots, and the supposed improvement in precision derived in this way is not necessarily reliable. Sixth, the writing in this manuscript could be improved. For example, it is not necessary to write the results and findings of the study in the last paragraph of the INTRODUCTION.
Citation: https://doi.org/10.5194/hess-2023-273-RC1 -
AC1: 'Reply on RC1', Sophie Wagner, 17 Jan 2024
Thank you for taking your time to read our manuscript and your comments. We agree that our study could benefit from a sensitivity analysis with respect to grid cell resolution. We will thus present a robustness section in which we compare to an adjusted hexagonal grid size, such that either the total number of cells is in the same ballpark as the lon-lat grid, or that the cells are about the same size as the average grid cell size of the lon-lat grid. Second, we will present the results for a classical area weighting, by including weights in the lon-lat model. This addresses reviewer points 2-5 by including versions that are comparable to the higher grid resolution data. We will thus be able to assess whether results are dependent on changes in grid cell size. It remains to be seen whether the issue of area deformation is insignificant at these higher resolutions (reviewer point 1) and we like to keep this an hypothesis to be tested in this paper. Further, we will do our best to improve our writing and eliminate excessive paragraphs (reviewer point 6).
Citation: https://doi.org/10.5194/hess-2023-273-AC1
-
AC1: 'Reply on RC1', Sophie Wagner, 17 Jan 2024
-
CC1: 'Comment on hess-2023-273', Marko Kallio, 05 Jan 2024
The study presents an interesting comparison of discrete grid choice in a statistical modelling context. However, I think that the study does not really provide a fair comparison between the square and hexagonal grids.
1. The main reason is that the two grids are of vastly different size with the hexagonal grid containing nearly 10 times fewer cells than the original square grid. The model trained on hexagonal grid is working on aggregates of the original data, while the square grid model works with data in the original resolution. It is generally easier to model averaged data than data with original granularity. I would expect that the difference mostly disappears if the hexagonal grid has similar resolution than the 0.5 degree square grid.
2. Relating to the above, the square grid would have to be approximately 1.5 degrees in resolution (7200 grid cells) to be approximately similar resolution to the hexagonal grid used. With this resolution, I do expect differences between the two configurations arising from the different shapes. However, I think that this is far too poor resolution to model global irrigation, given the availability of data and computing power available these days.
3. The use of the hexagonal grid is justified with the area distortion of the geographic coordinate systems, which the hexagonal grid in a geographic coordinate system. However, I would like to point out that the use of geographic coordinate system should have no influence in statistical modelling; computing correct areas for grids in geographical coordinate systems can be done with any major spatial library. If wrong areas are used, it is an error in the methodology, not a flaw in the geographic coordinate system. In addition, projecting the data to an equal-area projection (such as Mollweide or Equal Earth) eliminates the issue.For these reasons I think that the choice of the shape of the spatial unit of analysis, in this context, makes little difference. It would provide some benefits in use cases sensitive to the distances between the cell neighbours, such as the examples given in introduction (flow directions, watershed delineation, or stream network extraction). However, I am open to be shown wrong in this matter.
Citation: https://doi.org/10.5194/hess-2023-273-CC1 -
AC2: 'Reply on CC1', Sophie Wagner, 17 Jan 2024
Thank you for taking your time to read our manuscript and your comments. We agree that our study could benefit from a sensitivity analysis with respect to grid cell resolution, which we will implement in a revised manuscript as outlined in response to reviewer 1. This will deal with your point 1-2. With respect to your point 3, on top of the importance of topological relations (which we agree are less important in our application, though the analysis could be extended to include spatial autocorrelation, for example), to us the issue is one of representation: locations near the equator are represented by relatively fewer data points because the grid cells are larger, while points towards the poles are represented by relatively more data points because the grid cells are smaller. Admittedly, the root of this problem lies in the mapping of point measurements of variables onto a grid, reinforced by running models on that same grid. As a first step towards analysing the impacts of this problem, we remap from the lon-lat to the hexagonal grid, thereby emulating the problem. Any aggregation biases are hopefully minimised now that we go down to a hexagonal grid resolution that matches the size of the lon-lat grid.
Citation: https://doi.org/10.5194/hess-2023-273-AC2
-
AC2: 'Reply on CC1', Sophie Wagner, 17 Jan 2024
-
RC2: 'Comment on hess-2023-273', Anonymous Referee #2, 24 Apr 2024
The manuscript "Drivers of global irrigation expansion: the role of discrete global grid choice" discusses the adoption of hexagonal cells, as an alternative to the traditional longitude-latitude grid for global statistical irrigation modeling. The authors suggest that the new method can mitigate the area distortion and increase predictive accuracy, and the random forest method was used to identify the potential drivers of historical global irrigation expansion. Overall, this study represents an attempt to address the challenges associated with traditional grid systems in global irrigation analysis. However, there are significant areas that require further investigation to fully substantiate the authors' claims, particularly the potential bias introduced by grid cell size, which may make the comparison less fair.
Specific comments:
1) I think there seems to be a disconnect between the discussion of grid bias and the examination of global irrigation drivers in the Introduction. It would strengthen the manuscript if the authors explicitly linked how grid biases might affect the analysis of irrigation drivers, especially given the fact that significant irrigated areas are concentrated in low to mid latitudes where these biases are less pronounced (as shown in Fig. 1).
2) For the data, LPJmL primarily uses biophysical inputs. If socioeconomic factors significantly influence irrigation practices, and are not included in the model's parameters, how can the simulation data be used to study the socioeconomic impact? Also, is there any validation of the LPJmL results with observational datasets?
3) The detailed description of the random forest algorithm (like Figure 2 and Algorithm 1) might be unnecessary if no modifications were made to the algorithm in this study, given the popularity of random forest in the geoscience literature.
4) For the results in Figure 3, given the difference in grid cell sizes between the longitude-latitude grid and the ISEA3H grid, I am concerned that the larger grid cells in the ISEA3H model may be smoothing out prediction errors that are apparent in the finer-scale longitude-latitude grid. Therefore, an analysis of how grid cell size might affect apparent accuracy would be appreciated. In addition, could you provide R² values for the predictive performance of each grid model, in addition to RMSE and NRMSE?
5) The same would apply to variable importance. Different grid sizes could lead to different interpretations of what is most important in determining irrigation expansion, potentially biasing the model results. In other words, the difference between the two grid systems in the study may be the result of different grid sizes. Therefore, it is worth investigating how the grid cell size may affect the importance of variables in the random forest model.Citation: https://doi.org/10.5194/hess-2023-273-RC2 -
AC3: 'Reply on RC2', Sophie Wagner, 10 May 2024
Thank you for your comments and suggestions and taking the time to read our paper. 1. We agree that we should underline the reasoning of including all terrestrial grid cells in our analysis already in the introduction. Our aim is to explicitly incorporate the probability to observe irrigation in addition to the irrigation amount into the analysis, and hence the area that is not irrigated is important. 2. LPJmL uses historical inputs, which represent also socioeconomic influences: e.g. land use fractions (which crop has been grown where reflects e.g. farmers
choices based on subsidies), management options such as irrigation systems etc. Our
statistic analysis is designed to assess the influence of precisely these drivers. 3. Thank you for your feedback, we would like to explain the methods we are using and would therefore like to keep this part here. However, we will check if there are parts that can be condensed. 4. We agree that grid cell sizes are crucial in our analysis. We computed additional robustness specifications to benchmark a standard longitude-latitude grid with incorporated area weights, a ISEA3H grids with resolution 8 (to have the average grid cell size equal to the lon-lat grid) and resolution 9 (to have a total number of grid cells in the same ballpark as the lon-lat grid). We also implemented a bootstrapping method to further investigate the sensitivity of our results. We would like to include these results in the next versions of the paper. Thank you for the suggestion to include the R2 values, we agree that this would be beneficial. 5. This is a very good point and we will investigate the effect on variable importance using the additional specifications.Citation: https://doi.org/10.5194/hess-2023-273-AC3
-
AC3: 'Reply on RC2', Sophie Wagner, 10 May 2024
Status: closed
-
RC1: 'Comment on hess-2023-273', Anonymous Referee #1, 20 Dec 2023
This manuscript is dedicated to exploring the use of hexagons to represent irrigation maps and identify irrigation drivers, which is interesting. However, this study has significant flaws. First, area deformation is more severe at high latitudes, but the global irrigated area is more distributed in the mid-latitudes and this effect would not be significant. Second, I think there are currently feasible ways to correct for the effect of latitudinal differences on area when counting global land averages. Third, the hexagon used is too large, too low resolution, and contains a large error. It is difficult to compare it directly with higher resolution grid data. Fourth, the importance of the random forest variables is strongly dependent on the data resolution, and the importance of variables with low resolution may be underestimated. Fifth, the sample sizes are an order of magnitude different when comparing precision in scatterplots, and the supposed improvement in precision derived in this way is not necessarily reliable. Sixth, the writing in this manuscript could be improved. For example, it is not necessary to write the results and findings of the study in the last paragraph of the INTRODUCTION.
Citation: https://doi.org/10.5194/hess-2023-273-RC1 -
AC1: 'Reply on RC1', Sophie Wagner, 17 Jan 2024
Thank you for taking your time to read our manuscript and your comments. We agree that our study could benefit from a sensitivity analysis with respect to grid cell resolution. We will thus present a robustness section in which we compare to an adjusted hexagonal grid size, such that either the total number of cells is in the same ballpark as the lon-lat grid, or that the cells are about the same size as the average grid cell size of the lon-lat grid. Second, we will present the results for a classical area weighting, by including weights in the lon-lat model. This addresses reviewer points 2-5 by including versions that are comparable to the higher grid resolution data. We will thus be able to assess whether results are dependent on changes in grid cell size. It remains to be seen whether the issue of area deformation is insignificant at these higher resolutions (reviewer point 1) and we like to keep this an hypothesis to be tested in this paper. Further, we will do our best to improve our writing and eliminate excessive paragraphs (reviewer point 6).
Citation: https://doi.org/10.5194/hess-2023-273-AC1
-
AC1: 'Reply on RC1', Sophie Wagner, 17 Jan 2024
-
CC1: 'Comment on hess-2023-273', Marko Kallio, 05 Jan 2024
The study presents an interesting comparison of discrete grid choice in a statistical modelling context. However, I think that the study does not really provide a fair comparison between the square and hexagonal grids.
1. The main reason is that the two grids are of vastly different size with the hexagonal grid containing nearly 10 times fewer cells than the original square grid. The model trained on hexagonal grid is working on aggregates of the original data, while the square grid model works with data in the original resolution. It is generally easier to model averaged data than data with original granularity. I would expect that the difference mostly disappears if the hexagonal grid has similar resolution than the 0.5 degree square grid.
2. Relating to the above, the square grid would have to be approximately 1.5 degrees in resolution (7200 grid cells) to be approximately similar resolution to the hexagonal grid used. With this resolution, I do expect differences between the two configurations arising from the different shapes. However, I think that this is far too poor resolution to model global irrigation, given the availability of data and computing power available these days.
3. The use of the hexagonal grid is justified with the area distortion of the geographic coordinate systems, which the hexagonal grid in a geographic coordinate system. However, I would like to point out that the use of geographic coordinate system should have no influence in statistical modelling; computing correct areas for grids in geographical coordinate systems can be done with any major spatial library. If wrong areas are used, it is an error in the methodology, not a flaw in the geographic coordinate system. In addition, projecting the data to an equal-area projection (such as Mollweide or Equal Earth) eliminates the issue.For these reasons I think that the choice of the shape of the spatial unit of analysis, in this context, makes little difference. It would provide some benefits in use cases sensitive to the distances between the cell neighbours, such as the examples given in introduction (flow directions, watershed delineation, or stream network extraction). However, I am open to be shown wrong in this matter.
Citation: https://doi.org/10.5194/hess-2023-273-CC1 -
AC2: 'Reply on CC1', Sophie Wagner, 17 Jan 2024
Thank you for taking your time to read our manuscript and your comments. We agree that our study could benefit from a sensitivity analysis with respect to grid cell resolution, which we will implement in a revised manuscript as outlined in response to reviewer 1. This will deal with your point 1-2. With respect to your point 3, on top of the importance of topological relations (which we agree are less important in our application, though the analysis could be extended to include spatial autocorrelation, for example), to us the issue is one of representation: locations near the equator are represented by relatively fewer data points because the grid cells are larger, while points towards the poles are represented by relatively more data points because the grid cells are smaller. Admittedly, the root of this problem lies in the mapping of point measurements of variables onto a grid, reinforced by running models on that same grid. As a first step towards analysing the impacts of this problem, we remap from the lon-lat to the hexagonal grid, thereby emulating the problem. Any aggregation biases are hopefully minimised now that we go down to a hexagonal grid resolution that matches the size of the lon-lat grid.
Citation: https://doi.org/10.5194/hess-2023-273-AC2
-
AC2: 'Reply on CC1', Sophie Wagner, 17 Jan 2024
-
RC2: 'Comment on hess-2023-273', Anonymous Referee #2, 24 Apr 2024
The manuscript "Drivers of global irrigation expansion: the role of discrete global grid choice" discusses the adoption of hexagonal cells, as an alternative to the traditional longitude-latitude grid for global statistical irrigation modeling. The authors suggest that the new method can mitigate the area distortion and increase predictive accuracy, and the random forest method was used to identify the potential drivers of historical global irrigation expansion. Overall, this study represents an attempt to address the challenges associated with traditional grid systems in global irrigation analysis. However, there are significant areas that require further investigation to fully substantiate the authors' claims, particularly the potential bias introduced by grid cell size, which may make the comparison less fair.
Specific comments:
1) I think there seems to be a disconnect between the discussion of grid bias and the examination of global irrigation drivers in the Introduction. It would strengthen the manuscript if the authors explicitly linked how grid biases might affect the analysis of irrigation drivers, especially given the fact that significant irrigated areas are concentrated in low to mid latitudes where these biases are less pronounced (as shown in Fig. 1).
2) For the data, LPJmL primarily uses biophysical inputs. If socioeconomic factors significantly influence irrigation practices, and are not included in the model's parameters, how can the simulation data be used to study the socioeconomic impact? Also, is there any validation of the LPJmL results with observational datasets?
3) The detailed description of the random forest algorithm (like Figure 2 and Algorithm 1) might be unnecessary if no modifications were made to the algorithm in this study, given the popularity of random forest in the geoscience literature.
4) For the results in Figure 3, given the difference in grid cell sizes between the longitude-latitude grid and the ISEA3H grid, I am concerned that the larger grid cells in the ISEA3H model may be smoothing out prediction errors that are apparent in the finer-scale longitude-latitude grid. Therefore, an analysis of how grid cell size might affect apparent accuracy would be appreciated. In addition, could you provide R² values for the predictive performance of each grid model, in addition to RMSE and NRMSE?
5) The same would apply to variable importance. Different grid sizes could lead to different interpretations of what is most important in determining irrigation expansion, potentially biasing the model results. In other words, the difference between the two grid systems in the study may be the result of different grid sizes. Therefore, it is worth investigating how the grid cell size may affect the importance of variables in the random forest model.Citation: https://doi.org/10.5194/hess-2023-273-RC2 -
AC3: 'Reply on RC2', Sophie Wagner, 10 May 2024
Thank you for your comments and suggestions and taking the time to read our paper. 1. We agree that we should underline the reasoning of including all terrestrial grid cells in our analysis already in the introduction. Our aim is to explicitly incorporate the probability to observe irrigation in addition to the irrigation amount into the analysis, and hence the area that is not irrigated is important. 2. LPJmL uses historical inputs, which represent also socioeconomic influences: e.g. land use fractions (which crop has been grown where reflects e.g. farmers
choices based on subsidies), management options such as irrigation systems etc. Our
statistic analysis is designed to assess the influence of precisely these drivers. 3. Thank you for your feedback, we would like to explain the methods we are using and would therefore like to keep this part here. However, we will check if there are parts that can be condensed. 4. We agree that grid cell sizes are crucial in our analysis. We computed additional robustness specifications to benchmark a standard longitude-latitude grid with incorporated area weights, a ISEA3H grids with resolution 8 (to have the average grid cell size equal to the lon-lat grid) and resolution 9 (to have a total number of grid cells in the same ballpark as the lon-lat grid). We also implemented a bootstrapping method to further investigate the sensitivity of our results. We would like to include these results in the next versions of the paper. Thank you for the suggestion to include the R2 values, we agree that this would be beneficial. 5. This is a very good point and we will investigate the effect on variable importance using the additional specifications.Citation: https://doi.org/10.5194/hess-2023-273-AC3
-
AC3: 'Reply on RC2', Sophie Wagner, 10 May 2024
Data sets
Code files Sophie Wagner https://github.com/SophieWag/isea3h_irrigation
Model code and software
Data and code files Sophie Wagner https://doi.org/10.5281/zenodo.10012830
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
396 | 130 | 42 | 568 | 37 | 33 |
- HTML: 396
- PDF: 130
- XML: 42
- Total: 568
- BibTeX: 37
- EndNote: 33
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1