Supplement of Machine learning deciphers CO 2 sequestration and subsurface ﬂowpaths from stream chemistry

Abstract. Endmember mixing analysis (EMMA) is often used by hydrogeochemists
to interpret the sources of stream solutes, but variations in stream
concentrations and discharges remain difficult to explain. We discovered
that machine learning can be used to highlight patterns in stream chemistry
that reveal information about sources of solutes and subsurface groundwater
flowpaths. The investigation has implications, in turn, for the balance of
CO2 in the atmosphere. For example, CO2-driven weathering of
silicate minerals removes carbon from the atmosphere over ∼106-year timescales. Weathering of another common mineral, pyrite, releases sulfuric
acid that in turn causes dissolution of carbonates. In that process,
however, CO2 is released instead of sequestered from the atmosphere. Thus, understanding long-term global CO2 sequestration by weathering
requires quantification of CO2- versus H2SO4-driven
reactions. Most researchers estimate such weathering fluxes from stream
chemistry, but interpreting the reactant minerals and acids dissolved in streams has been fraught with difficulty. We apply a machine-learning
technique to EMMA in three watersheds to determine the extent of mineral
dissolution by each acid, without pre-defining the endmembers. The results
show that the watersheds continuously or intermittently sequester CO2, but the extent of CO2 drawdown is diminished in areas heavily affected
by acid rain. Prior to applying the new algorithm, CO2 drawdown was
overestimated. The new technique, which elucidates the importance of
different subsurface flowpaths and long-timescale changes in the watersheds,
should have utility as a new EMMA for investigating water resources
worldwide.



Section 1 NMF Model
To employ NMF on limited datasets of stream chemistry, a bootstrapped data set was generated using a multivariate normal distribution of log-transformed stream water chemistries, similar to the procedure outlined in Lautz et al. (2014). The bootstrapped dataset matches the measured means of the log-transformed stream water chemistries and maintains covariation between analytes. A comparison between the measured and bootstrapped data sets can be seen 5 in Fig. S2. All of the input features were normalized to values between 0 and 1 to not bias the model training to any one input feature. Next, the model was trained to the bootstrapped dataset using NMF algorithms in the python library scikit-learn (Pedregosa et al., 2011). Lastly, the trained model was applied to measured stream water samples to delineate mixing proportions.
The model results are sensitive to the random initiation of the H matrix (i.e., endmember chemistries) used in 10 the training. To produce a more robust decomposition, the starting H matrix was randomly initiated 20,000 times. For each stochastic iteration, we used NMF to calculate optimal W and H matrices (Eq 1) and then filtered out any models with proportions that did not add to 1 + 0.05. Additionally, the fit of the model was evaluated from SSE: Remembering that no carbonates dissolve into water flowing along the shallow path, then similar arguments for the shallow flowpath yield: With respect to the atmosphere considered over the long-term (10 5 -10 6 yr), H2SO4-weathering of silicates and CO2weathering of carbonates are CO2 neutral, while CO2-weathering of silicates sequesters CO2 and H2SO4-weathering 110 of carbonates releases CO2 (Fig. 1). As seen in Figure 1, per mole of CaSiO3 or CaCO3 weathered, CO2-weathering of silicates sequesters 1 mol of CO2 and H2SO4-weathering of carbonates releases 0.5 moles of CO2. In terms of [Σ . ]total, CO2-weathering of silicates sequesters 0.5 moles of CO2 per base cation equivalent released into solution and H2SO4weathering of carbonates releases 0.25 moles of CO2 per base cation equivalent released into solution ( Fig. 1; Reactions 2, 3, 6, and 7). For a given water sample, the cation concentrations record the extent of dissolution of 115 carbonate and silicates, as long as the contribution of these base cations from acid rain is minimal. (For simplicity, we do not correct [Σ . ] for rain chemistry but see SM Section 4). Therefore, the uptake or release of CO2, Δ ( , can be calculated for any given stream water sample: Using Δ ( , we calculate the flux of CO2 using the discharge measurements for each sample (see Fig. S4).
Next, we will derive kstream, the modern CO2 sequestration coefficient. In general, both kstream and krock (see SM 2.3) are used as ways to note the extent that weathering in a watershed is sequestering or releasing CO2. kstream is the 125 amount of CO2 emitted or sequestered calculated from [Σ . ] /0/12 as described above, normalized by [Σ . ] /0/12 (meq/l): The negative sign is used so that a negative kstream represents sequestration (uptake of CO2), and a positive kstream represents release. From eq. S11 it is apparent that the CO2 emitted or sequestered equals the product, kstream [Σ . ] /0/12 , with the appropriate sign. Total dissolved base cations in a stream draining a watershed with no carbonate nor pyrite are attributed here entirely as CO2-weathering: this watershed demonstrates the highest capacity to sequester CO2 and 8/461) equals -0.5. Substituting from eq. S10 into eq. S11 yields: We can further expand eq. S12 by substituting eq. S6 for [S + ]carbonate-H2SO4 , eq. S9 for [S + ]silicate-CO2 , eq. S4 for [S + ]silicate and eq. S8 for [S + ]silicate-H2SO4 This can be rearranged and simplified as: We then define the second term (ratio of carbonate-derived base cations to total base cations in the stream sample) as gstream and the third term (ratio of the sulfate equivalents (from sulfuric acid) to the equivalents of base cations in the stream) as zstream. Note that to obtain the sulfate equivalents, we multiply [SO4 2-]total by 2, resulting in the third term 150 equal to 0.5zstream. Given these definitions, eq. S14 yields eq. 2 from the main text:

S2.3 Using Rock Chemistry to Calculate CO2 Drawdown or Release
Here we compare the bulk elemental composition of parent rock to topsoil and calculate the difference to determine if the system acted on net as a source or a sink of CO2 over the weathering duration. Of course, this calculation involves inspection only of rock versus soil chemistry and cannot therefore be used to separate CO2-versus H2SO4weathering when the latter is derived from acid rain. The three most important factors are i) the ratio of base cations 160 in carbonates relative to silicates in the rock, ii) the ratio of acid-generating units of pyrite relative to total base cations in carbonate+silicate minerals, and iii) the ratio of base cations still retained in regolith at the land surface relative to total base cations. This latter ratio is related to the chemical depletion factor (written below as -t), i.e., the relative ratio of loss of a component in a rock to chemical weathering versus total loss by physical + chemical weathering (Riebe et al. 2003). For (i), we define the carbonate/silicate factor, grock, which is the proportion of base 165 cation equivalents in the rock derived from carbonate minerals divided by the total base cations: Here CX,k is the mole fraction (mol/kg) of base cation (X = Ca, Mg, Na, or K) in carbonates (k = carb) or in carbonate 170 + silicate minerals (k = Total). By definition, grock ranges from 0 (where all base cations derive from silicates) to 1 (where all base cations derive from carbonates). Likewise, 1-grock is the proportion of base cations derived from silicate minerals.
When pyrite oxidizes it produces sulfuric acid that can dissolve carbonate and silicate minerals. This impacts CO2 dynamics over 10 5 -10 6 yr timescales by releasing CO2 (H2SO4-weathering of carbonates). But it also diminishes 175 the silicate content of the rock, thereby diminishing the rock's capacity to sequester CO2. Here, we define a new variable, zrock, which is the acid generation capacity expressed relative to the base cations in the rock (all on an equivalents basis): Here, the subscript py refers to pyrite (mol/kg rock). We multiply the concentration of pyrite (i.e., Cpy) by 4 (eq/mol) because 4 equivalents of sulfate are produced per mole of pyrite as shown in reaction S17.

185
Lastly, in many catchments, the bulk chemistry of parent rock is not indicative of the CO2 sequestration during weathering because silicate minerals are kinetically slow to dissolve and they do not completely dissolve before the rock physically erodes. On the other hand, we assume here that all carbonate minerals chemically weather away before exposure at land surface, an assumption most useful for wet climates and relatively low-carbonate 190 content rocks. The relative depletion of an element in a weathered rock with respect to the parent rock is easily calculated from the mass transfer coefficient, t. 195 Here, C is the concentration of a base cation (j) or an immobile element (i) in the parent or weathered rock. When t at the top of the weathering profile is 0, the composition of the weathering material is the same with respect to base cations and immobile element i as the parent and none of these elements have been lost to solution (they will be eroded instead of chemically weathered). When t = -1, all of the element has been lost to solution and none is left to erode away.

200
Using the variables grock, zrock, and t, we now define krock, the long-term CO2 sequestration coefficient of the rock: Here, (1-grock) is the proportion of base cation equivalents associated with silicate minerals. We multiply this by 0.5 because 1 mol of CO2 is sequestered during weathering of 2 eq of base cations when considered over 10 5 to 10 6 yr timescales (see Fig. 1 reactions 1 and 2). If pyrite oxidation is coupled to carbonate dissolution, 2 mols of CO2 are released per mole of pyrite in the rock (see Fig. 1 reaction 7), yielding the term − -( zrock based on eq S16. Likewise, pyrite oxidation could be coupled to silicate dissolution. In this case, 1 mol of pyrite consumes 2 mols of silicate 210 minerals. Because 1 mol of Ca-silicate mineral sequesters 1 mol of CO2 over 10 5 to 10 6 yr timescales (see Fig. 1 reactions 1 and 2), CO2 sequestration is reduced by 2 mols of CO2 per mole pyrite in the rock. Again, based on eq S16, this is equivalent to − -( zrock. Lastly, tsilicate cations is the mass transfer coefficient for base cations in silicates at the land surface. It ranges from 0 (no base cations in silicate minerals have been removed by dissolution) to -1 (all the base cations in silicate minerals have been removed by dissolution). 215 Finally, noting that tsilicate cations is generally not reported, we must instead calculate it from t, the mass transfer coefficient for total base cations in the bulk rock: Again, we emphasize wet climates and low-carbonate terrain and implicitly assume that all carbonates are fully dissolved at the land surface (i.e., tcarbonate cations = -1) to solve for tsilicate cations: Now we substitute eq. S21 in eq. S19 and simplify to the final equation 3 from the main text: When krock < 0, the rock has sequestered CO2 from the atmosphere over the residence time of the soil and when krock > 0 the rock released CO2.

225
Mathematically, this equation is only valid as long at t < -grock. The minimum value of krock is -0.5, which is a pure silicate rock dissolved only by CO2. The maximum value of krock is 0.25, which is a pure carbonate rock weathered only by sulfuric acid. It is mathematically impossible for krock < -0.5; however, it is mathematically possible to have krock > 0.25. In these situations, there is more sulfuric acid in the system than can be buffered by both carbonate and silicate weathering.

Lag-time Calculation
Using rain chemistry data from the National Atmospheric Deposition Program (NADP; http://nadp.slh.wisc.edu/) site PA42, we calculated the annual flux of sulfate into Shale Hills from wet deposition. We used the flux data to calculate a trend in wet deposition over time and then used the regression to calculate when 39.5 mmol m -2 yr -1 was 235 deposited (i.e., 31 years prior to today). Next we added dry deposition as an input (estimated as 30% wet deposition; Lynch and Corbett;1989), fit a new regression to wet+dry deposition over time, and recalculated the lag time (i.e., 19 years; Fig. 4C). Although not explicitly calculated here, Hubbard Brook also shows a lag in deposition to export on similar timescales, which is consistent with the excess sulfate export observed in other studies (Likens et al., 2002).

Mineral-derived Solute Concentrations
The contributions of ankerite and calcite to the Ca 2+ budget were calculated using the composition of the appropriate Here, 1.6 is the stoichiometric number relating Mg 2+ to Ca 2+ in Ankerite (see Table S3 Here, 0.28 is the stoichiometric number relating K + to Mg 2+ in illite (see Table S3). The concentration of chloritederived Mg 2+ is calculated as the difference between the total Mg 2+ , the ankerite-derived Mg 2+ and the illite-derived Mg 2+ . Fluxes of solutes derived from each mineral are summarized in Table S5 for Shale Hills.

Section 3 Seasonality of Pyrite-sulfate Fluxes
At Shale Hills, the proportion of pyrite-derived sulfate leaving the catchment accounts for 23% of the annual sulfate flux (Table 1) but ranges from 99% of total sulfate in the dry season (summer, fall) to as low as 3% in the wet season (winter, spring, Fig. 5A). This is easily explained because the stream is sustained by deep groundwater that flows up 265 into the stream from the deep pyrite reaction front during the dry summer and fall but not in the winter and less acid rain enters the catchment in the dry season (Li et al., 2017).

Section 4 Rain-correction
For simplicity, we do not correct [Σ . ]total (eq. S3) for rain chemistry; however, it is likely that some of the cations in  Table S6).    Calcite