CAMELS-Chem: Augmenting CAMELS (Catchment Attributes and Meteorology for Large-sample Studies) with Atmospheric and Stream Water Chemistry Data
Abstract. Large sample datasets are transforming hypothesis testing and model fidelity in the catchment sciences, but few large stream water chemistry datasets exist with complementary streamflow, meteorology, and catchment physiographic attributes. Here, we pair atmospheric deposition and water chemistry related information with the existing CAMELS (Catchment Attributes and Meteorology for Large-sample Studies) dataset. The newly developed dataset, CAMELS-Chem, comprises U.S. Geological Survey water chemistry data and instantaneous discharge over the period from 1980 through 2014 in 506 minimally impacted headwater catchments. The CAMELS-Chem dataset includes 18 common stream water chemistry constituents: Al, Ca, Cl, Dissolved Organic Carbon, Total Organic Carbon, HCO3, K, Mg, Na, Total Dissolved Nitrogen [nitrate + nitrite + ammonia + organic-N], Total Organic Nitrogen, NO3, Dissolved Oxygen, pH (field and lab), Si, SO4, and water temperature. We also provide an annual wet deposition loads from the National Atmospheric Deposition Program over the same catchments that includes: Ca, Cl, H, K, Mg, and Total Nitrogen from deposition [precipitation NO3 + NH4, dry deposition of particulate NH4, + NO3, and gaseous NH3], Na, NH4, NO3, SO₄. We release a paired instantaneous discharge (and mean daily discharge) measurement for all chemistry samples. To motivate wider use by the larger scientific community, we develop three example analyses: 1. Atmospheric-aquatic linkages using atmospheric and stream SO4 trends, 2. Hydrologic-biogeochemical linkages using concentration-discharge relations, and 3. Geological-biogeochemical linkages using weathering relations. The retrieval scripts and final dataset of > 412,801 individual stream water chemistry measurements are available to the wider scientific community for continued investigation.
Gary Sterle et al.
Status: final response (author comments only)
RC1: 'Comment on hess-2022-81', Anonymous Referee #1, 08 Apr 2022
- AC1: 'Reply on RC1', Adrian Harpold, 21 Jul 2022
RC2: 'Comment on hess-2022-81', Anonymous Referee #2, 02 Jun 2022
- AC2: 'Reply on RC2', Adrian Harpold, 21 Jul 2022
Gary Sterle et al.
Gary Sterle et al.
Viewed (geographical distribution)
Evaluating the overall quality of the preprint ("general comments"),
Sterle et al. present a compiled novel dataset of water quality solutes and atmospheric deposition inputs for the CAMEL catchments. Their work augmenting existing and widely used CAMELS datasets is needed for further research in analyzing spatial and temporal water quality trends in minimally impacted watersheds. Existing papers have done large-scale water quality analyses, but few have provided open-access datasets and the breadth of solutes.
At its core, this is a data paper. As such, I think the methods need to be expanded. My comments primarily pertain to the data and methods, as I see that this is the paper's novelty.
The CAMELS dataset is widely used, and the addition of water chemistry provides the opportunity for analysis. However, I think the dataset could be improved substantially. This paper by Sterle et al. provides an excellent resource for the community.
Methods and Results
Length of dataset
In the paper, the dataset is stated to end in 2014 (see Data Comments below because I'm not sure if this is accurate). However, in many cases, solute and discharge data are available until the at least end of the NADP reporting period. Soon 2014 will be 10-years ago, and I worry the data will not be quickly obsolete and not used to its fullest capacity. The value of this dataset would be exponential if the authors were able to harmonize data from various agencies.
The USGS has developed methods to generate the longest timeseries possible for watersheds. However, I do not see these methodologies applied to these CAMELS watersheds.
Three different approaches can be used alone or in concert to expand the dataset.
Figures capture the various hydrological/biogeochemical metrics for select solutes in section 4. However, as a user, I find it challenging to evaluate whether the data would be sufficient for my use case. The authors have included some analysis of data coverage in section 3.1, however, since the strength of this paper lies in the data there could be more information to help users evaluate whether this dataset suits their needs.
There should be summary figures for all solutes, so users can adequately assess whether the dataset is appropriate for their use. I think the paper would benefit tremendously if the dataset had more metadata and signatures/summary statistics. Specifically, in Addor et al. (2017), the authors summarized many indices and described the indices in great detail (see Table 2 and Table 3 in Addor et al). I suggest summarizing information about (1) missing data/data gaps, years of continuous data, (2) low/high flow distribution, (3) FDC the WQ spans (figure 7), (4) seasonality of hydrology and solutes, and other metrics that the authors deem useful.
These comments pertain to the files on the google drive.
Data provided has inconsistencies in the way the date is reported. Example from camel_chem_v3.
It appears that the data available in camel_chem_q_v1 ends in 2018. Please update the manuscript with the correct dates if this data is available.
The original CAMELS dataset provides Shapefiles, however, to allow for seamless merging of data, the header used to identify which column the watershed IDs are the same as the name used by the original CAMELS dataset.
The dataset provided should be able to stand on its own without needing the other CAMELs dataset. The watershed metadata should thus be included (area, outlet latitude and longitude, USGS gage number with leading 0s).
Individual scientific questions/issues ("specific comments"),
Line 84-87: Authors state that they have WQ data from 1980-2014. Later the authors state that the data is "for the same time period" as NADP data (1985-2019). Dates should be consistent.
Line 103: Section 2.2 should be written in a high-level abstract way. I find it unclear how these frameworks are applied to this data in the way it is currently written. I would be better to see more specificity and names of the database (ex. NADP, NWIS).
Line 125: It is unclear whether "daily average discharge" means a continuous dataset or just discharge measurements for the data that there are solutes. The dataset provided suggests the latter, but I think there would be value in providing the daily discharge timeseries for the same timespan of the solute data.
Line 135: In Table 2, deposition units are reported as mg/L. However, NADP reports their deposition in both concentration and kg per hectare. Are the units in Table 2 a mistake? If not, can you add some detail on the methods used to convert concentration to an area normalized load?
Table 1: Consider added the NWIS parameter code. For example, is "Nitrate, water filtered" the nitrate plus nitrite (00631) or just nitrate (00618)? There are many parameters for slightly similar solutes and it would help with reproducibility if the parameters codes were included. Also, consider listing the difference between pH in the field and pH in the lab for users.
Table 1: Also consider adding more detail to units. For example, is nitrate mg-NO3/L or mg-N/L.
Line 204: EPA link is broken. I have had many issues with direct links where they are archived and become a dead end. I highly encourage the authors to find a paper with a DOI to support this sentence. As a starting point, you can consider:
Technical corrections ("technical corrections": typing errors, etc.).
Sprague, L. A., Oelsner, G. P., & Argue, D. M. (2017). Challenges with secondary use of multi-source water-quality data in the United States. Water Research, 110, 252–261.