the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A global assessment of nitrogen concentrations using spatiotemporal random forests
Abstract. Anthropogenic nitrogen fluxes into surface freshwater bodies significantly impair water quality (WQ), pose serious health hazards, and create critical environmental threats. Quantification of the magnitude and impact of WQ issues requires identifying the key controls of nitrogen dynamics and assessing past and future patterns of global nitrogen flows. To achieve this, we adopted a data-driven, machine learning approach to build a space-time random forest model for simulating nitrogen concentration in 115 major river basins of the world. The proposed random forest-based WQ model regressed the monthly measured nitrogen concentration collected at 718 river stations across the globe for the period of 1992–2010 onto a set of 17 predictor variables with a spatial resolution of 0.5-degree. The resulting model was validated with data from river basins outside the training dataset, and was used to predict nitrogen concentrations in all river basins globally, including many with scarce or no observations. We predict that the regions with highest median nitrogen concentrations in their rivers (in 2010) were: United States, India, Pakistan, Bangladesh, China, and most of Europe. Furthermore, our results showed that the rate of increase between 1990s and 2000s was greatest in rivers located in eastern China, eastern and central parts of Canada, Baltic states, southern Finland, Pakistan, parts of Russia, mainland southeast Asia, and south-eastern Australia. We found that, globally, the most influential predictors of nitrogen concentrations are temporal: month of the year and cumulative month count, reflecting the secular trend. Apart from temporal variables, cattle density, nitrogen fertilizer application, temperature, precipitation, and pig population are the most influential predictors of nitrogen pollution of the river systems. The proposed global WQ model will provide a new tool to explore agricultural and land management strategies designed to reduce nitrogen pollution in freshwater bodies at large spatial scales.
- Preprint
(1350 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on hess-2021-618', Anonymous Referee #1, 05 Feb 2022
Overall I was not impressed by the methodological advancements or the scientific implicaitons of this project. The authors built a standard ML algorithm, random forests, and applied it to a global nitrogen dataset. The portrayal of it as a spatiotemporal model is misleading: It is a standard random forest that uses month and lat/lon as additional predictors, which is perfectly fine but not a novel type of random forest model. The model appears to perform well on held out data, so the authors take that as evidence to then apply it globally and make maps. From those maps and the model itself, the authors pull out generally banal conclusions that have been recorded elsewhere. The 'so what' of the paper from the discussion section is that it could conceivably be used by other stakeholders in applications, but this link felt very light, so I am left skeptical of any pathway to impact.
Specific feedback:
The literature review of ML methods in water quality (Section 1.2) does not powerfully motivate the present study. Why are the “three critical observations” listed interesting and relevant? This section overall feels disconnected from the rest of the paper. The authors also fail to discuss why so few papers attempt to apply their models at a global scale (extrapolation risks) and make it seem that the community just never tried before, which is not the case.
Broadly, the motivation for this work is not clear and compelling. “The primary goal of this study is to introduce a global WQ model that is based on ML approach. (Section 1.3)”. There are hundreds of water quality models that are based on ML, as they mention in their previous section. This paper is not really about providing a new dataset then, but a new model. But their model is relatively off the shelf and does not tell us anything about the systems we do not already know. Overall building a ML model, because we can, is not a compelling motivation in 2022.
Observational data: Why is the data collection period ceased in 2010? Data continues to be collected, so this seems an arbitrary cutoff removing potentially more data. Second, the stations used to build the model have large geographic disparities the authors do not discuss at length (e.g. abundance of sites in Brazil and Europe). Sampling bias by location is a huge consideration when applying the map globally. I think this manuscript is an unsupported (by the data validation presented) extrapolation of a model to locations far different that those used to train the model. The authors gloss over this critical consideration when making the main global maps (Figs. 6-8).
Predictors data: The authors say they started with a list of 27 candidate variables but then reduced it by more than half, to around 13 variables, to “reduce…redundant information” but one of the key advantages of random forests is that they work well with highly correlated variables. What were the other variables considered that were ultimately not included? Were the datasets aggregated over the watershed boundaries corresponding to each sampling location (for variables like precipitation and runoff they need to be)? Land cover is also known to be relevant but only cropland area was included.Model development: The novelty of this random forest methodology is greatly overemphasized. There is research into spatio-temporal random forests, but those are far more advanced than what was applied here, making the title of the paper misleading. Here is an off-the-shelf random forest that anyone taking a Coursera data science course could apply successfully. To clarify, I am okay with the algorithm but troubled by the emphasis on its importance and novelty. Including latitude and longitude as predictors hardly makes this a spatial statistical model. Including month of the year hardly makes this a time series model.
Testing set: I would like to see sites completely held out as well to see how well the model predicts at new locations.
Model evaluation: What is the distribution of R2 values by location? Presumably some locations perform better than others. Also, the metrics are produced on a log-transformed scale. What is the mean absolute error or root-mean-squared error in interpretable, mg/L units? A strong performing model in log-log space is quite easy to produce (across domains, not just water quality) so it is important to record performance metrics in the back-transformed data space relevant to decision makers.
Model evaluation (contd): I would like to see comparison of this model to benchmark models. For example, how does this compare against a simple linear regression? Against a mixed effects regression? Against simply fitting linear trends independently at each site? Not all of these need to be done, but some sort of well selected benchmarks are needed to contextualize model performance.
Model evaluation (contd): How to performance metrics compare with similar nitrogen modeling studies? If this model is the core advancement of the paper, its performance relative to other literature has to be clear and impressive. Unclear at this point if that is the case.
Model interpretation: The variable importance feature is interesting, but I want to see the influence of each variable on the outcome to check they make scientific sense. Otherwise the model could be getting it ‘right’ for the ‘wrong’ reasons. Partial dependence plots or the like are one way to plot those dependencies and could provide more interesting scientific findings rather than the surficial relationships presented so far in the paper.
Literature discussion: Overall it did not seem like the results were sufficiently contextualized in the literature. This goes for the performance metrics and the identification of increasing/decreasing trends in certain regions. Several of these regions have already been identified as having increasing/decreasing trends so how do these results build off of (or contradict) the prior literature?
Figure 3 is not helpful, perhaps move to SI if authors feel it is relevant.
Figures 4 and 5: What do the observed and predicted look like in original units? If this model and data outputs will ultimately be useful, it has to perform well in the original units. Figure 5 (test data) is more relevant than Figure 4 (training data), so Fig 4 could go to the SI.
Figures 6-8 I worry considerably about extrapolation, so I do not trust the majority of locations shown. Also, how about accompanying uncertainty maps?
Figure 8: Adds little not shown elsewhere.
Figure 9: Why do they find time series is more relevant? Is that surprising? Is it interesting that cattle is ranked where it is? The ‘so what?’ is missing here.
Citation: https://doi.org/10.5194/hess-2021-618-RC1 -
AC1: 'Response to Comments from Reviewer #1', Razi Sheikholeslami, 30 Mar 2022
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2021-618/hess-2021-618-AC1-supplement.pdf
-
AC1: 'Response to Comments from Reviewer #1', Razi Sheikholeslami, 30 Mar 2022
-
RC2: 'Comment on hess-2021-618', Anonymous Referee #2, 06 Feb 2022
The manuscript, titled “A global assessment of nitrogen concentrations using spatiotemporal random forests” by Sheikholeslami and Hall, introduced a machine learning (ML) approach (random forest model - RF) for predicting in-stream nitrogen (NOx-N) concentrations at the global scale. According to the authors, the novelties of this work are (1) its global scale application and (2) the spatio-temporal RF approach proposed in this study. In general, the manuscript was well written. Despite the results (instream NOx-N concentration) look quite well, the are several points regarding the model/approach used in this study that need to be addressed.
General comments:
1) Representation of nitrogen (N) lag times from input to riverine N export: For water quality (e.g., N) modeling, it is expected that there could be significant N accumulated in the soil as biogeochemical legacy and the long travel time within the unsaturated/groundwater zone that could result in a lag time of years to decades between N input and riverine N export signals (e.g., Meals et al., 2010; Van Meter et al., 2017; Chen et al., 2018). It is unclear to me how the proposed RF model could take into account these factors. From my interpretation of the result, the “cumulative month count” variable (Figure 9) somehow could compensate for this kind of effect. However, we should not try to get the right result for the wrong reason.
2) Variable importance: Why are the month of the year and the cumulative month count the most important variables? I am wondering if the data used in the model has a strong seasonality that makes the variable “month of year” really matter. If this is the case, what is the implication for model application/performance in other areas that have less/no clear seasonality? Is the predictor “cumulative month count” highly important because of an increasing trend in the output variables in many areas (lines 495-497)? If yes, what are the implications from this? Why does “fertilizer application” have a low rank? Why is there not much difference in the variables that were ranked 3rd to 15th (Figure 9)?
3) Spatial unit: For predicting instream nitrogen concentrations, it is not clear to me why the authors did not use river network (instead of grid cell) as a spatial unit. In 1 grid cell (size of ≈ 55 km2) there could be several rivers, so it is unclear if the predicted values are applied for the main or tributary rivers. In addition, with rivers in big basins (e.g., Elbe, Mississippi, Amazon, Mekong River Basins, etc) that are running across multiple grid cells, it would be useful to incorporate the effect of upstream management/catchment characteristics into the model, it is not clear if this was considered in the RF model or not. As I understood from the description, the predictors for instream N concentration prediction currently only cover the properties within the grid cell of interest (no consideration of information from the upstream grid cell for large rivers).
4) Capacities/limitation section: I would suggest including a section describing the model capacities and limitations of the model/approach. Althought the authors mentioned these points briefly in the conclusion section, however, they could be extended in a sperated section.
Specific comments:
Lines 30: “In addition, extensive construction of dams, excessive extraction of groundwater, deforestation, and expanding agricultural land use have altered sedimentary processes, mobilization of salts, and nutrient export to river systems, all of which drive WQ deterioration and groundwater pollution in many parts of the world…”. Were these factors considered in the model?
Lines 183-184: What is the temporal resolution of the NOxâN data and how were they aggregated to monthly timestep?
Line 206: Please indicate where the readers could find the list of 27 potential explanatory variables
Line 287: “The second strategy …” The second strategy has not been mentioned before this point
Line 291: “Cumulative Month since 1992”: how sensitive are the results to the start of the month count? This is a critical point if someone wants to run the model for other periods
Line 295: “..17 variables” – Please point out the list of 17 variables
Figure 3: I would suggest adding more frames to the “output” panel (as already done in the “input” panel) to reflect that the spatial and temporal properties of outputs
Lines 418-419: Is there a high correlation between elevation and latitude/longitude?
Lines 426 – 430: back to table 1, it is not clear whether these data (livestock population) were assumed to be time-invariant or time-variant
Line 447-448: “This might be partially due to a high correlation between the agricultural fraction of land area and nitrogen fertilizer use” – Why do highly correlated variables have different rankings?
Table 2: How were annual time step data (especially fertilizer application) disaggregated to monthly time step?
Table 1: Please provide full names for the technical terms (e.g., ANN, DT, MLT,…) in Table 1
References:
Meals, D. W., Dressing, S. A., & Davenport, T. E. (2010). Lag time in water quality response to best management practices: A review. Journal of environmental quality, 39(1), 85-96.
Van Meter, K. J., N. B. Basu, and P. Van Cappellen (2017). Two centuries of nitrogen dynamics: Legacy sources and sinks in the Mississippi and Susquehanna River Basins, Global Biogeochem. Cycles, 31, 2–23.
Chen, D., Shen, H., Hu, M., Wang, J., Zhang, Y., & Dahlgren, R. A. (2018). Legacy nutrient dynamics at the watershed scale: principles, modeling, and implications. Advances in agronomy, 149, 237-313.
Citation: https://doi.org/10.5194/hess-2021-618-RC2 -
AC2: 'Response to Comments from Reviewer #2', Razi Sheikholeslami, 30 Mar 2022
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2021-618/hess-2021-618-AC2-supplement.pdf
-
AC2: 'Response to Comments from Reviewer #2', Razi Sheikholeslami, 30 Mar 2022
-
RC3: 'Comment on hess-2021-618', Anonymous Referee #3, 19 Feb 2022
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2021-618/hess-2021-618-RC3-supplement.pdf
-
AC3: 'Response to Comments from Reviewer #3', Razi Sheikholeslami, 30 Mar 2022
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2021-618/hess-2021-618-AC3-supplement.pdf
-
AC3: 'Response to Comments from Reviewer #3', Razi Sheikholeslami, 30 Mar 2022
Status: closed
-
RC1: 'Comment on hess-2021-618', Anonymous Referee #1, 05 Feb 2022
Overall I was not impressed by the methodological advancements or the scientific implicaitons of this project. The authors built a standard ML algorithm, random forests, and applied it to a global nitrogen dataset. The portrayal of it as a spatiotemporal model is misleading: It is a standard random forest that uses month and lat/lon as additional predictors, which is perfectly fine but not a novel type of random forest model. The model appears to perform well on held out data, so the authors take that as evidence to then apply it globally and make maps. From those maps and the model itself, the authors pull out generally banal conclusions that have been recorded elsewhere. The 'so what' of the paper from the discussion section is that it could conceivably be used by other stakeholders in applications, but this link felt very light, so I am left skeptical of any pathway to impact.
Specific feedback:
The literature review of ML methods in water quality (Section 1.2) does not powerfully motivate the present study. Why are the “three critical observations” listed interesting and relevant? This section overall feels disconnected from the rest of the paper. The authors also fail to discuss why so few papers attempt to apply their models at a global scale (extrapolation risks) and make it seem that the community just never tried before, which is not the case.
Broadly, the motivation for this work is not clear and compelling. “The primary goal of this study is to introduce a global WQ model that is based on ML approach. (Section 1.3)”. There are hundreds of water quality models that are based on ML, as they mention in their previous section. This paper is not really about providing a new dataset then, but a new model. But their model is relatively off the shelf and does not tell us anything about the systems we do not already know. Overall building a ML model, because we can, is not a compelling motivation in 2022.
Observational data: Why is the data collection period ceased in 2010? Data continues to be collected, so this seems an arbitrary cutoff removing potentially more data. Second, the stations used to build the model have large geographic disparities the authors do not discuss at length (e.g. abundance of sites in Brazil and Europe). Sampling bias by location is a huge consideration when applying the map globally. I think this manuscript is an unsupported (by the data validation presented) extrapolation of a model to locations far different that those used to train the model. The authors gloss over this critical consideration when making the main global maps (Figs. 6-8).
Predictors data: The authors say they started with a list of 27 candidate variables but then reduced it by more than half, to around 13 variables, to “reduce…redundant information” but one of the key advantages of random forests is that they work well with highly correlated variables. What were the other variables considered that were ultimately not included? Were the datasets aggregated over the watershed boundaries corresponding to each sampling location (for variables like precipitation and runoff they need to be)? Land cover is also known to be relevant but only cropland area was included.Model development: The novelty of this random forest methodology is greatly overemphasized. There is research into spatio-temporal random forests, but those are far more advanced than what was applied here, making the title of the paper misleading. Here is an off-the-shelf random forest that anyone taking a Coursera data science course could apply successfully. To clarify, I am okay with the algorithm but troubled by the emphasis on its importance and novelty. Including latitude and longitude as predictors hardly makes this a spatial statistical model. Including month of the year hardly makes this a time series model.
Testing set: I would like to see sites completely held out as well to see how well the model predicts at new locations.
Model evaluation: What is the distribution of R2 values by location? Presumably some locations perform better than others. Also, the metrics are produced on a log-transformed scale. What is the mean absolute error or root-mean-squared error in interpretable, mg/L units? A strong performing model in log-log space is quite easy to produce (across domains, not just water quality) so it is important to record performance metrics in the back-transformed data space relevant to decision makers.
Model evaluation (contd): I would like to see comparison of this model to benchmark models. For example, how does this compare against a simple linear regression? Against a mixed effects regression? Against simply fitting linear trends independently at each site? Not all of these need to be done, but some sort of well selected benchmarks are needed to contextualize model performance.
Model evaluation (contd): How to performance metrics compare with similar nitrogen modeling studies? If this model is the core advancement of the paper, its performance relative to other literature has to be clear and impressive. Unclear at this point if that is the case.
Model interpretation: The variable importance feature is interesting, but I want to see the influence of each variable on the outcome to check they make scientific sense. Otherwise the model could be getting it ‘right’ for the ‘wrong’ reasons. Partial dependence plots or the like are one way to plot those dependencies and could provide more interesting scientific findings rather than the surficial relationships presented so far in the paper.
Literature discussion: Overall it did not seem like the results were sufficiently contextualized in the literature. This goes for the performance metrics and the identification of increasing/decreasing trends in certain regions. Several of these regions have already been identified as having increasing/decreasing trends so how do these results build off of (or contradict) the prior literature?
Figure 3 is not helpful, perhaps move to SI if authors feel it is relevant.
Figures 4 and 5: What do the observed and predicted look like in original units? If this model and data outputs will ultimately be useful, it has to perform well in the original units. Figure 5 (test data) is more relevant than Figure 4 (training data), so Fig 4 could go to the SI.
Figures 6-8 I worry considerably about extrapolation, so I do not trust the majority of locations shown. Also, how about accompanying uncertainty maps?
Figure 8: Adds little not shown elsewhere.
Figure 9: Why do they find time series is more relevant? Is that surprising? Is it interesting that cattle is ranked where it is? The ‘so what?’ is missing here.
Citation: https://doi.org/10.5194/hess-2021-618-RC1 -
AC1: 'Response to Comments from Reviewer #1', Razi Sheikholeslami, 30 Mar 2022
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2021-618/hess-2021-618-AC1-supplement.pdf
-
AC1: 'Response to Comments from Reviewer #1', Razi Sheikholeslami, 30 Mar 2022
-
RC2: 'Comment on hess-2021-618', Anonymous Referee #2, 06 Feb 2022
The manuscript, titled “A global assessment of nitrogen concentrations using spatiotemporal random forests” by Sheikholeslami and Hall, introduced a machine learning (ML) approach (random forest model - RF) for predicting in-stream nitrogen (NOx-N) concentrations at the global scale. According to the authors, the novelties of this work are (1) its global scale application and (2) the spatio-temporal RF approach proposed in this study. In general, the manuscript was well written. Despite the results (instream NOx-N concentration) look quite well, the are several points regarding the model/approach used in this study that need to be addressed.
General comments:
1) Representation of nitrogen (N) lag times from input to riverine N export: For water quality (e.g., N) modeling, it is expected that there could be significant N accumulated in the soil as biogeochemical legacy and the long travel time within the unsaturated/groundwater zone that could result in a lag time of years to decades between N input and riverine N export signals (e.g., Meals et al., 2010; Van Meter et al., 2017; Chen et al., 2018). It is unclear to me how the proposed RF model could take into account these factors. From my interpretation of the result, the “cumulative month count” variable (Figure 9) somehow could compensate for this kind of effect. However, we should not try to get the right result for the wrong reason.
2) Variable importance: Why are the month of the year and the cumulative month count the most important variables? I am wondering if the data used in the model has a strong seasonality that makes the variable “month of year” really matter. If this is the case, what is the implication for model application/performance in other areas that have less/no clear seasonality? Is the predictor “cumulative month count” highly important because of an increasing trend in the output variables in many areas (lines 495-497)? If yes, what are the implications from this? Why does “fertilizer application” have a low rank? Why is there not much difference in the variables that were ranked 3rd to 15th (Figure 9)?
3) Spatial unit: For predicting instream nitrogen concentrations, it is not clear to me why the authors did not use river network (instead of grid cell) as a spatial unit. In 1 grid cell (size of ≈ 55 km2) there could be several rivers, so it is unclear if the predicted values are applied for the main or tributary rivers. In addition, with rivers in big basins (e.g., Elbe, Mississippi, Amazon, Mekong River Basins, etc) that are running across multiple grid cells, it would be useful to incorporate the effect of upstream management/catchment characteristics into the model, it is not clear if this was considered in the RF model or not. As I understood from the description, the predictors for instream N concentration prediction currently only cover the properties within the grid cell of interest (no consideration of information from the upstream grid cell for large rivers).
4) Capacities/limitation section: I would suggest including a section describing the model capacities and limitations of the model/approach. Althought the authors mentioned these points briefly in the conclusion section, however, they could be extended in a sperated section.
Specific comments:
Lines 30: “In addition, extensive construction of dams, excessive extraction of groundwater, deforestation, and expanding agricultural land use have altered sedimentary processes, mobilization of salts, and nutrient export to river systems, all of which drive WQ deterioration and groundwater pollution in many parts of the world…”. Were these factors considered in the model?
Lines 183-184: What is the temporal resolution of the NOxâN data and how were they aggregated to monthly timestep?
Line 206: Please indicate where the readers could find the list of 27 potential explanatory variables
Line 287: “The second strategy …” The second strategy has not been mentioned before this point
Line 291: “Cumulative Month since 1992”: how sensitive are the results to the start of the month count? This is a critical point if someone wants to run the model for other periods
Line 295: “..17 variables” – Please point out the list of 17 variables
Figure 3: I would suggest adding more frames to the “output” panel (as already done in the “input” panel) to reflect that the spatial and temporal properties of outputs
Lines 418-419: Is there a high correlation between elevation and latitude/longitude?
Lines 426 – 430: back to table 1, it is not clear whether these data (livestock population) were assumed to be time-invariant or time-variant
Line 447-448: “This might be partially due to a high correlation between the agricultural fraction of land area and nitrogen fertilizer use” – Why do highly correlated variables have different rankings?
Table 2: How were annual time step data (especially fertilizer application) disaggregated to monthly time step?
Table 1: Please provide full names for the technical terms (e.g., ANN, DT, MLT,…) in Table 1
References:
Meals, D. W., Dressing, S. A., & Davenport, T. E. (2010). Lag time in water quality response to best management practices: A review. Journal of environmental quality, 39(1), 85-96.
Van Meter, K. J., N. B. Basu, and P. Van Cappellen (2017). Two centuries of nitrogen dynamics: Legacy sources and sinks in the Mississippi and Susquehanna River Basins, Global Biogeochem. Cycles, 31, 2–23.
Chen, D., Shen, H., Hu, M., Wang, J., Zhang, Y., & Dahlgren, R. A. (2018). Legacy nutrient dynamics at the watershed scale: principles, modeling, and implications. Advances in agronomy, 149, 237-313.
Citation: https://doi.org/10.5194/hess-2021-618-RC2 -
AC2: 'Response to Comments from Reviewer #2', Razi Sheikholeslami, 30 Mar 2022
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2021-618/hess-2021-618-AC2-supplement.pdf
-
AC2: 'Response to Comments from Reviewer #2', Razi Sheikholeslami, 30 Mar 2022
-
RC3: 'Comment on hess-2021-618', Anonymous Referee #3, 19 Feb 2022
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2021-618/hess-2021-618-RC3-supplement.pdf
-
AC3: 'Response to Comments from Reviewer #3', Razi Sheikholeslami, 30 Mar 2022
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2021-618/hess-2021-618-AC3-supplement.pdf
-
AC3: 'Response to Comments from Reviewer #3', Razi Sheikholeslami, 30 Mar 2022
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
1,182 | 428 | 50 | 1,660 | 44 | 49 |
- HTML: 1,182
- PDF: 428
- XML: 50
- Total: 1,660
- BibTeX: 44
- EndNote: 49
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1