A data-driven method for estimating the composition of end-members from stream water chemistry time series

Xu Fei, Esther; Harman, Ciaran Joseph

doi:https://doi.org/10.5194/hess-26-1977-2022

Articles | Volume 26, issue 8

https://doi.org/10.5194/hess-26-1977-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/hess-26-1977-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 26, issue 8

Research article

|

22 Apr 2022

Research article |

| 22 Apr 2022

A data-driven method for estimating the composition of end-members from stream water chemistry time series

Esther Xu Fei and Ciaran Joseph Harman

Download

Final revised paper (published on 22 Apr 2022)
Preprint (discussion started on 15 Jun 2020)

Interactive discussion

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

SC1: 'Great job of A data-driven method for estimating the composition of end-members from streamwater chemistry observations', Jianyu Fu, 01 Jul 2020
- AC1: 'Response to short comment', Ciaran Harman, 11 Aug 2020
RC1: 'Review of Xu Fei & Harman', Joost Delsman, 09 Jul 2020
- AC2: 'Response to Referee 1', Ciaran Harman, 02 Sep 2020
RC2: 'Review Comments', Anonymous Referee #2, 13 Jul 2020
- AC3: 'Response to Referee 2', Ciaran Harman, 02 Sep 2020

Peer-review completion

AR: Author's response | RR: Referee report | ED: Editor decision

ED: Publish subject to revisions (further review by editor and referees) (10 Sep 2020) by Genevieve Ali

AR by Ciaran Harman on behalf of the Authors (21 Oct 2020) Author's response Manuscript

ED: Referee Nomination & Report Request started (30 Oct 2020) by Genevieve Ali

RR by Anonymous Referee #3 (06 Dec 2020)

Suggestions for revision or reasons for rejection

The authors proposed an interesting method to decompose stream water into end-members using stream tracer data alone. I like this idea very much. However, the shortages of the method are apparent. For example, (1) it relies heavily on the used data. When the used stream samples are different, the method likely yields different identifications of end-members. The numbers of stream samples, the extreme points involved in the input data, the tracers measured from the stream samples, as well as the seasons during which the stream water were collected, have significant impacts on the outputs; (2) The interactions of the tracer concentrations and contributions to stream water of end-members were not treated well. An end-member with extremely high or low tracer concentration may not necessarily result in extreme concentration of stream water when its contribution is rather low. I am wondering the ability of the current method to identify end-members with low contributions to stream water. (3) Seems extreme points in the data series of stream tracer concentration are required for the implement of the method. If the collected stream water samples do not show any outliers, does the method still work? The mixture of end-members with distinguished tracer concentrations not certainly result in much changes in the stream water tracer concentration, considering their contributions are changing in the time periods. (4) The method may not be able to identify end-members with similar tracer concentrations. This may be not important when we are focusing on the components contributing to the tracer concentration of stream water. But identifying runoff components with similar tracer composition could be very important to understand the changes of hydrological processes in catchments.
Some of the shortages have been pointed out by the authors in Section 4.4. But, I am wondering if these issues can be solved. For example, points 1)and 2), how to relax the constraints on CH-NMF? Without appropriate assumptions, the results of the method could be rather doubt. From my taste, the current results/discussions/explains are far away from convincing to be published.

Hide

RR by Anonymous Referee #2 (19 Dec 2020)

Suggestions for revision or reasons for rejection

After I examined the revision, my concerns about the validity of this pure mathematical tool grow instead of diminish. This mathematical tool is surely beautiful, but whether or not it yields hydrochemically meaningful results has not been well demonstrated. No matter how beautifully can chemical concentrations in end-members be inversely derived mathematically from streamflow chemistry alone, it has to be proved to be accurate with foreseeable and acceptable uncertainties. The uncertainty I am talking about here is not the uncertainty arising from chemical analysis as we always know but one caused by this tool per se. There are two questions that are related to this type of uncertainty, which were inquired earlier but not actually addressed in the revision. If the number of samples changes significantly, can chemical concentrations in end-members be still determined within an acceptable range? If non-conservative solutes are included in the analysis, are the results of chemical concentrations in end-members and the number of end-members determined by this tool consistent and valid? These questions have to be answered with actual data analyses before it becomes a convincing tool. As it currently stands, I do not feel it is ready for others to use this tool with high confidence.

Chemical concentrations in end-members were determined by data structure by CHEMMA. No doubt that a significant change in the number of samples will change chemical concentrations in end-members. But as along as the determined concentrations are within certain range, we are fine with it. Simply saying CHEMMA applies to large data set without a demonstration is not an appropriate answer. Also, how large can we consider it to be “large”?

In my opinion, non-conservative solutes should not be included in the analysis, nor can chemical concentrations of non-conservative solutes in end-members be derived from CHEMMA. Otherwise, CHEMMA should be a totally different set-up and have to deal with chemical equilibrium. With non-conservative solutes included in CHEMMA, however, solutions can be achieved mathematically by increasing the number of end-members. Rick Hooper noticed this problem in EMMA in his later work, which eventually led the publication of diagnostic tools of mixing models in 2003. Including non-conservative solutes in EMMA will cause polygon to bend outward. Yes, this problem can be mathematically resolved by adding additional “end-members” to obtain a more complex polygon, but they are not truly end-members because in such cases the assumptions of mixing models are violated. Analogically, the same issue applies to CHEMMA. Why not testing using DTMM if all solutes included in CHEMMA are conservative and examining whether or not non-conservative solutes caused the fourth end-member? Also, why not determining the number of end-members using DTMM and comparing with CHEMMA? Simply citing the published results of Hooper et al. (1990) cannot guarantee those solutes are conservative, as conservative behavior of those solutes, along with the number of end-members, were determined subjectively at the time.

I strongly suggest to add additional data sets from at least one more study site, in which both DTMM and EMMA were applied. An additional comparison will significantly increase the confidence in CHEMMA. Of course, this addition, along with uncertainty analysis with varying number of samples suggested above, may significantly increase the length of the manuscript and make it invalid as a technical note. As a matter of fact, I do think this work should be a research article instead of a technical note, as by its current presentation it is not convincing and not ready to be adopted by others for their use.

Also, the English writing has to be improved, not just on grammars but also on how to communicate effectively and clearly. I hope Ciaran could jump in and make a final touch before it is re-submitted.

Introduction needs to be re-written and re-organized. As it was currently written, the concept of why CHEMMA was developed was not appropriately set up and even mis-leading. Diagnostic tools of mixing models (DTMM) was developed to determine conservative solutes and the number of end-members. To use EMMA (Christophersen and Hooper, 1992), one has to start with DTMM (Liu et al., 2008); otherwise, the mixing model assumptions cannot be reinforced. By reading the introduction alone, it appears that CHEMMA is developed to substitute field measurements, which is not true even by authors’ view.

The discussion provided nothing solid. What can and cannot CHEMMA help? What is the limitation of CHEMMA in terms of applications in hydrologic science?

Let me list a few examples of writing problems in the introduction section. Similar problems occur in other sections.

P2/L24: The statement here is true for “concentrations of conservative solutes”; do not use “chemical composition” as it is too broad and should include non-conservative species.

P2/L28-29: Why not listing all assumptions, e.g., (1) tracers must be conservative, (2) the number of end-members is known, (3) tracer concentrations in end-members are temporally and spatially invariant? Then, in later sections or paragraphs, make it clear how CHEMMA is also constrained by any of these assumptions and how CHEMMA helps to reinforce the assumptions.

P2/L32-33: This statement is misleading or even incorrect. Hooper used experimental data, not real-world data in that study. Instead, you could argue that with an increasing number of study sites having long-term measurements, analysis of the data structure (this may not be a good word choice) of long-term stream water chemistry MAY reveal characteristics of end-members. Later in your manuscript, you need to come back to this argument and discuss if it is valid and in which ways CHEMMA helps.

P2/L49-56: Hooper (2003) was improperly cited. What does DTMM really do? How should EMMA be conducted after DTMM was published? I strongly suggest that authors read carefully a recent article that used both DTMM and EMMA before writing the introduction.

P3/L57-63: Again, DTMM was mis-cited here. It is DTMM that resolved the issue of conservative tracers and the number of end-members. EMMA is not able to deal with non-conservative mixing, which is true, but so is CHEMMA in my opinion. Otherwise, CHEMMA would be a totally different set-up for your study, in which you have to deal with chemical equilibrium.

Hide

RR by Anonymous Referee #4 (19 Dec 2020)

Suggestions for revision or reasons for rejection

The authors present a promising method that allows to identify and characterize end-members using stream water tracer concentrations only. While I believe their work is a valuable contribution to existing literature, one major shortcoming is the lacking literature review and thus relating their work to the existing literature (beyond EMMA). I am aware that the field of end-member mixing modeling is wide, however, the authors miss to acknowledge key papers of the field of hydrology such as Carrera et al., 2003, (https://doi.org/10.1029/2003WR002263) or Genereux, 1998 (https://doi.org/10.1029/98WR00010). Likewise, they neglect recently published papers that provide methodological advances in mixing analyses within hydrology, e.g., Beria et al., 2020 (https://doi.org/10.5194/gmd-13-2433-2020), Popp et al., 2019, (https://doi.org/10.1029/2019WR025677), or Barbeta and Peñuelas, 2017, (https://doi.org/10.1038/s41598-017-09643-x). The authors claim that no method exists to characterize missing or unmeasured end-members, however, Popp et al., 2019, have provided an approach that allows to identify unmeasured end-members.

Another shortcoming is that the method is only applied to one data set. I strongly suggest to apply the method to other data sets (using different tracers and in a different geographic setting) to prove the robustness of the method. It’s been shown that the tracer set size has a major influence on end-member mixing modeling (Barthold et al.). Testing the method on other datasets should be feasible since many datastes are readily available nowadays.

I really appreciate the detailed description of the methods and the valuable reflections (section 4) added to the latest version (v3). It is also great that the code and data are available in a Jupyter notebook.

The manuscript is well written and clearly structured, however, readability should be improved by correcting for a couple of grammatical flaws (e.g., articles are often missing) that I believe to have detected.

Comments:
• L. 4 and throughout the text: consider replacing “candidate” with “potential”
• L. 24: replace “should” with “can” be explained
• L. 28: the hypotheses
• L. 29: I would rephrase it saying that the “1) stream water consists of the identified end-members and 2) all end-members were identified correctly”.
• L. 63: please add a reference to the 4th statement
• L. 67-68: this statement is not true. See comment above about method provided by Popp et al., 2019, (https://doi.org/10.1029/2019WR025677)
• L. 74-76: I would really appreciate to see this method applied to other datasets
• L 92: “end-member mixing” approach/method?
• L. 96: subspace
• L. 152: analysis many times (instead of “a large number of”)
• L. 163: can you please specify “reasonable”?
• L. 181 following: please statistically quantify the similarity between field measurements and your values. E.g., not only alkalinity (hillslope) seems to differ considerably. Also, consider removing decimal points in Table 1 given the high st.dev.
• L. 204: please use a statistical test to quantitatively describe how well you can reproduce end-members of Hooper et al.
• L. 225: that is a great suggestion! You could also indicate that a time lag representing a delay caused by tracer transport from the source to the output (see Beria et al.) adds uncertainty.
• L. 232: I would remove this statement.
• increase font size in Fig. 3
• Figs. 2 and 3 and table 2: specify what is meant by “organic”

Hide

ED: Publish subject to revisions (further review by editor and referees) (21 Dec 2020) by Genevieve Ali

Dear authors,

I have heard back from three reviewers regarding your revised manuscript. One reviewer had commented on your original submission, while two others had not. You will see that while all reviewers find your mathematical approach quite elegant and interesting in itself, they also find that your manuscript does not clearly demonstrate how it can lead to hydrochemically relevant results. The returning reviewer (from the first round of review) found that some of their original comments have not been (well) addressed in your revision. One reviewer underlined that your literature review omits several key pieces of the literature, while others found that your decision to use non-conservative tracers was not well justified, and that your comment about using "large" datasets – without quantifying what "large" is – was unclear. Two reviewers suggested testing your approach on multiple datasets to fully evaluate its robustness/relevance. One reviewer even suggested that your manuscript be converted to a full-size research paper to afford you the space to really present and detail your approach in a more convincing way.

Given these comments, and others that are detailed in the three reviewer reports (and sometimes recurrent across reviewer reports), I am returning your manuscript for revisions. When I receive your newly revised technical note, I will send it back for review. It will be important for all major concerns to have been fully addressed in your twice-revised technical note before a positive recommendation regarding publication can be made.

Alternatively, you may also decide to follow the suggestion made by one reviewer and go for a full-size manuscript instead of a technical note. I believe that without the strict length constraints associated with the technical note, it may be easier for you to fully address the comments raised by the reviewers. If you were to decide to go that route, you would just need to let me know so that I can check with Copernicus about how to switch manuscript type.

With very best wishes,

Genevieve Ali

Hide

AR by Ciaran Harman on behalf of the Authors (02 Jun 2021) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (03 Jun 2021) by Genevieve Ali

RR by Anonymous Referee #4 (30 Jun 2021)

RR by Anonymous Referee #2 (01 Jul 2021)

Suggestions for revision or reasons for rejection

Major Concerns

Conservative behavior of tracers used in the analysis was not tested. I had this concern before, but it was not addressed adequately or correctly in the current revision. I suggested to run DTMM first to determine the conservative behavior of all solutes (e.g., completely conservative, semi- or quasi-conservative, and non-conservative under a lower dimensionality) and then to include conservative ones in CHEMMA. Instead, authors ran DTMM after CHEMMA and just compared the outcomes. They found that the residuals of sulfate, magnesium, and calcium still maintain some structures in a two-dimensional mixing space and the residual structures persist until the dimension goes beyond five (Figure 4 b). If this result is true (I am not sure if Shapiro-Wilk test is appropriate for this analysis; see my comment on Figure 4 below), it strongly indicates some solutes are not completely conservative. With six solutes, you cannot go beyond five dimensions to determine conservative solutes and the number of end-members (see my comment in the second round of revision). Authors misunderstood how DTMM works.

With that said, I do not mean you cannot run CHEMMA with all six solutes together or with different groups of solutes. Instead, I do encourage authors to run different versions of CHEMMA with various combinations of solutes (following the DTMM results) and compare the outcomes, including the number of end-members.

As a research article, I strongly suggest to run DTMM first (following Hooper 2003), then EMMA (similar to Christophersen and Hooper 1992), and finally compare with the results of CHEMMA (e.g., groups of all six solutes and conservative solutes identified by DTMM as mentioned above). The comparison should not be limited to the end-member composition, but include the number of end-members, the fractional contributions of end-members, and the end-member distances. For example, how do the end-member distances from the end-member composition of CHEMMA compare to that of the measured ones? Is there an improvement in the end-member distances with CHEMMA? I did not keep track of whether or not Hooper (2003) used exactly the same data set as Hooper (1990). If so, you may not need to re-run both DTMM and EMMA but just summarize their results.

Addition of a test with varying sample sizes (Figure 6) is nice and very much appreciated. One result is very much promising (e.g., relatively stable compositions for end-members 2 and 3), but others are not (e.g., significant variation of composition of end-member 1; still many outliers for end-members 2 and 3). Together with significant variability in identifying end-members of the synthetic data (Figure 8) and high uncertainties of algorithm (Figure 7), it indicates that the data structure or the distribution of sample points determines the end-member composition. The role of CHEMMA in end-member mixing analysis is limited. This limitation should be explicitly discussed and stated in the abstract and conclusion. This does not downplay CHEMMA’s value, but simply tell the truth so that future users will not be misled. As a matter of fact, CHEMMA would be very helpful in identifying a missing end-member, guiding field sampling of end-members, and generating a hypothesis test.

Moderate Comments/Concerns

Abstract:
The manuscript has been modified, with a new section (re: synthetic dataset) and a new analysis (re: varying number of samples) added, but the abstract was not updated to include the results from these analyses, nor was the limitation of the approach stated in the abstract.

Introduction:
Somewhere in the introduction, all of the mixing model assumptions needs to be explicitly listed (i.e., i. Tracers used in the mixing model must be conservative; ii. The number of end-members is known; iii. End-member compositions must be distinct for at least one tracer; iv. End-member compositions are spatiotemporally constant or their variations are known or treated as different end-members). These assumptions should be discussed, e.g., which ones have been addressed by which tools and which ones are still up for research. In my opinion, the first two assumptions have been resolved by the diagnostic tools of mixing models, which has to be acknowledged to respect the earlier study. This will pave a clear pathway for your own research. However, this does not mean DTMM cannot be challenged or improved.

Saying that including non-conservative solutes in the mixing models has not been resolved is inappropriate and misleading, as non-conservative solutes should not be included in the analysis based on the mixing model assumptions. This does not mean you cannot challenge the assumptions, but I do not think that is what your study aimed at.

Results:
How do the fractional contributions compare between your and Hooper’s results? Were the fractional contributions of the fourth and fifth end-members significant compared to the three end-members used by Hooper? How do the end-member distances change with the end-member composition of CHEMMA?

Figure 4:
Is the Shapiro-Wilk test appropriate for this analysis, as it is usually used for normal distribution test?

Miscellaneous Comments and Suggested Edits

Title: Get rid of “observations”, which I think is redundant.

P1/L1: Delete “, and is”.

P1/L2-3: This statement is not completely true, not exactly part of the mixing model assumption (see one of my comments above). As long as the temporal/spatial variation is known, a hydrograph separation is still valid.

P1/L4: Change “additional measurements” to “samples”.

P1/L5 and also L12: I think this article is no longer a technical note.

P1/L16: Delete “profile”.

P1/L17-18: This is not exactly true. The end-member composition does not have to be constant. Delete the phrase.

P2/L26: Again, it does not have to be temporally invariant.

P2/L26: Change “observations’ to “samples”.

P2/L28-30: This is probably where you state all the assumptions of mixing models as I suggested earlier.

P2/L33-35: Any citation(s) for this statement? This is the most important statement to justify your study.

P2/L40: Add “estimate uncertainties of ” after “to” if those are true.

P3/L61-63: This is only one of a few criteria used to screen end-members (see Hooper, 2003; Liu et al., 2008 and 2020).

P3/L63-65: This is not what Hooper (2003) means. Instead, he demonstrated the filature of using the rule of one. He suggested to use residual distribution pattern. Indeed, their 1992 paper followed the rule of one (as you stated in the sentence following this one). You have to follow the temporal evolution of EMMA and cannot use an early one to reject a later one.

P3/L69-74: With DTMM, the #2 is not true. The #3 should not be stated as mixing is subject to conservative tracers and so is CHEMMA as I talked above. I do not think CHEMMA can include non-conservative solutes.

P3/L81-82: Change “allows for identification of” to “aims at identifying”, as by far you have not yet demonstrated if you can.

P3/L84-85: I think what you want to say here is that end-member composition does not have to be distinct for all tracers (assumption iii above).

P4/L95: Change “find” to “determine”.

P4/L113: Why not running DTMM before CHEMMA?

P4/L114: Based on Figure 1, “…projected into the 2D subspace spanned by pair-wise PCs”?

P4/L115: Add “at each 2D subspace” at the end of the sentence. Then, get rid of the middle sentence if “pair-wise” is added as suggested above.

P5/L (not clear which line but the statement following “Result”): Does x-matrix contain the standardized values or original concentrations? Need to specify.

P5/L (#4 in the table): Change “needed” to “found” because the number is variable and it does not matter for how many to be found.

P5/L (#5 in the table): I am sure both I-vector and J-vector were explained in the text, but in the table their function still needs to be specified so that readers understand what SI-matrix means.

P5/L (#6 in the table): Need to say h and H represent for fractional contributions.

P5-6: After reading the text, it is still not clear how both I- and J-vectors were generated. Through an optimization procedure itself?

P6/L158-163: Kind of arguments are needed to set the stage for multiple runs. But the exact statements here fit better to discussion.

P7/L187: You cannot just cite Hooper et al. (1990). DTMM has to run to identify conservative solutes and the number of end-members.

P7-8/L193-215: Most if not all of them should be presented under 3.1.

P9/L253-259: Some of them are very much speculative.

P11/L325-330: These are not conclusion.

Figure 1: What are the red crosses in c? Why are they in different positions from those in b?
Also, need to say this uses 3D as an example.

Figure 2: Do not use the same colors for the PC axis’s of solutes as for the triangles; way too confusing. You used “diamonds” not “squares”. In the case of four end-members, this is not exactly the convex hull but the projected convex hull into 2D PC subspace. For four end-members, it needs three PC subspaces. This should be specified.

Figure 3: Do the colors shown in legend match those points on the plot? I do not see blue and red colors in the legend.

Also, the same legend used in all cases regardless of the number of endmembers. Is it necessary to show all four (maybe five) when you are seeking for only three endmembers?

Figure 4: Cannot be self-explained. Averaged from 100 runs? Residuals of what? What does five-fold cross validation mean?

Font size too small; resolution too low; symbol size too small too.

Very hard to distinguish Ca and Si curves.

Figure 5: Observed in both stream water and end-members? Do not use "observation" for "stream water" as observed end-members were also shown.

Figure 6: You mean "the same size ..."? Also, change "sample" to "samples". Font size too small.

Figure 7: How were algorithm and data uncertainties calculated? I do not think they were covered in the text.

Figure 8: (top) Different number of samples? Or, different sample distributions? (caption) Do they all have the same number of samples?

Figure 9: I cannot follow the definition of "percent end-member limited". If samples were generated from three synthetic end-members with constrains of each within the fraction of 0-1 and all summing to 1, how come are some outside of the triangle?

Standard deviation or normalized uncertainties? Also need to explain if the normalization is the same as the one shown in an earlier figure.

What do components X and Y mean here? PC components? If so, say so!

Table 1: Title should be placed above the table (different from figures) unless the journal requires to be placed under.

Indicate the number of end-members for each block.

What are their fractional contributions? Are the contributions from fourth and fifth end-members significant?

Table 2: Title should be above the table if the journal does not require to be under.

Hide

ED: Reconsider after major revisions (further review by editor and referees) (02 Jul 2021) by Genevieve Ali

AR by Ciaran Harman on behalf of the Authors (20 Sep 2021) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (15 Oct 2021) by Genevieve Ali

RR by Anonymous Referee #2 (07 Dec 2021)

Suggestions for revision or reasons for rejection

I appreciate authors’ patience and continuous effort to improve their work and the manuscript. Compared to its last version, this version has been significantly improved. Publication of this work may have a high impact. Thus, the quality of its presentation must be ensured.

My current concerns focus on the following:

(1) Authors misunderstood the end-member distance, which cannot be inferred from Figure 3. It is not the geometrical distance between two end-members in the mixing diagram, but the distance between U- and S-spaces (defined by Christophersen and Hooper, 1992) for individual solutes of each end-member. It is a measure of fitness an end-member to the mixing space. Examining the end-member distances may provide a strong support of the end-member characterization by CHEMMA. Field sampling from a specific location may not perfectly represent the characteristics of an end-member due to spatial (also temporal at some point) variation, but CHEMMA may do better in this case. Any improvement can be reflected by decreases in end-member distances between U- and S-spaces (the shorter the end-member distances, the better it fits to the mixing space; perfect if zeros for all solutes for an end-member).

This calculation is needed because in this study there is a lack of quantitative evaluation of end-member compositions by CHEMMA, other than qualitative comparison with field measurements.

(2) Thanks for explaining “percent end-member limited”! But I think it is still hard to follow, particularly because it is not explicitly described in the method section. If I understand it right, samples were generated randomly first and then screened by end-member criteria, with those outside the triangle discarded. If so, it should be called “percent of samples limited by end-member criteria”, which makes more sense.

More importantly, I do not understand the point you were making using those outside samples. Did you mean the more samples lie outside the triangle the better end-members were characterized, as shown by Figures 7 and 8? If this is indeed what you meant, it does not seem to be correct (well, actually, right results for wrong reasons) and even misleading. It is misleading because readers could consider samples outside the triangle here in your case the same as outliers we have often seen in our real samples. Essentially, cases 4, 5, and 6 did better because they contained more “extreme” samples that are closer to true end-members. I use “extreme” here to indicate samples having extremely high fraction of one end-member but extremely low fraction(s) of the others. This is analogic to using baseflow for groundwater in some real-world cases. If this is what you really meant, I think you have to change your description and make sure readers will not be misled.

Actually, a better, more intuitive way to do this is to generate random samples while applying the constrains at the same time, with varying constrains of end-member fractions, e.g., case 1: 0.4-0.6, case 2: 0.3-0.7, case 3: 0.2-0.8, case 4: 0.1-0.9, and case 5: 0-1.0. This will control the number of samples closer to the true end-members.

(3) Some statements are still pretty speculative (though they may not be incorrect). I suggest to stick with what your data (figures and tables) actually show and avoid stretching too much or being too inclusive. See examples below where this happens.

Miscellaneous Comments:

P1/L15-16: This clause is awkward. I think you meant that “a subset of samples with extremely high and low fractions of end-member contributions … under extreme hydrologic conditions”.

P2/L30-33: Cite literature where the statements came from. I think some languages/phrases are not accurate, e.g., “approximately” and “additional end-members” used in the statements.

P2/L46: Try not to use ambiguous phrases. Change to "a similar approach".

P3/L85-86: Not completely true. People used baseflow for groundwater in many cases. Change the statement to “there is not a method …, other than using baseflow to characterize groundwater …”. Add a reference (e.g., Liu et al., Ecohydrology, 2008).

P4-6/L116-146: I think “k” has a different meaning before and after Line 127. I think it means the number of PCs or the number of dimensions before Line 127, but the number of end-members or vertices after Line 127. Because the number of end-member is one more than the number of mixing dimensions, “k” can not be used the same in the two sections.

P5/L132: Cite "Step 3, Figure 1a" here.

P8/L214: You mean “four magenta diamonds”?

P8/L229: Not assumed. Using "suggested" may be a better word choice.

P8/L234-235: Good to have such a statement, but it is too casual and lack of specifics.

P9/L244-245: Why not cite the results of DTMM to support your statement here and also in the conclusion (P12/L359-360)?

P9/L255-256: An example of a speculative statement without a demonstration or support (though not necessarily incorrect).

P10/L280-286: Parenthesize these sequential numbers to avoid unnecessary confusion with counting numbers.

P16/L317-319: Both statements are awkward, but I know what you meant. “…small contribution …” in the first sentence refers to samples having extremely high and low fractions of end-member contributions under extreme hydrologic conditions and the second refers to some potential end-members exist in the catchment but their impact to streamflow and chemistry is insignificant. If so, change them.

P11/L330-339: Discussion mode.

P11/L336-338: Awkward and hard to follow.

P11/L342: Fix the grammar here.

P12/L345: I understand it but it would be hard for others to follow or to get it right away. I think you were talking about extreme samples here as I mentioned earlier.

P12/L351-354: Use consistent numbering. Used numbers before but letters here.

P12/L359-360: DTMM results should be cited in the results section.

P12/L361: Change “might” to “should”.

P12/L363-364: Clueless to readers with a sudden introduction of these terms. Citing a perception is better.

P12/L366 (beginning): Awkward and hard to follow.

P12/L362-369: Use consistent numbering, including parentheses (half or complete) throughout the manuscript.

Hide

ED: Publish subject to minor revisions (review by editor) (10 Dec 2021) by Genevieve Ali

AR by Ciaran Harman on behalf of the Authors (18 Jan 2022) Author's response Author's tracked changes Manuscript

ED: Publish subject to technical corrections (14 Feb 2022) by Genevieve Ali

AR by Ciaran Harman on behalf of the Authors (23 Feb 2022) Manuscript

Short summary

Water in streams is a mixture of water from many sources. It is sometimes possible to identify the chemical fingerprint of each source and track the time-varying contribution of that source to the total flow rate. But what if you do not know the chemical fingerprint of each source? Can you simultaneously identify the sources (called end-members), and separate the water into contributions from each, using only samples of water from the stream? Here we suggest a method for doing just that.