Reconstructing five decades of sediment export from two glaciated high-alpine catchments in Tyrol, Austria, using nonparametric regression

. To date, Knowledge on the response of sediment export to recent climate change in glaciated areas in 10 the European Alps is limited, knowledge on the effects of decadal-scale changes in climatic forcing on sediment export from glaciated high alpine areas is still limited. This is primarily due to the extreme scarcity ofbecause sufficiently longlong-term records of suspended sediment concentrations (SSC) are scarce, which precludes robust explorations of longer-term developments. Aggravatingly, insights are not necessarily transferable from one catchment to another, as sediment export can heavily depend on local preconditions (such as geology or 15 connectivity). However, gaining a better understanding of past sediment export is an essential step towards estimating future changes, which will affect downstream hydropower production, flood hazard, water quality and aquatic habitats. Here we tested the estimation of sediment export of the past five decades usingtest Quantile Regression Forest (QRF), a non-parametric, multivariate regression based on Random Forests. The regression builds on the 20 feasibility of reconstructing decadal-scale sediment export from short-term records of SSC and long time seriesrecords of the most important hydro-climatic drivers (discharge, precipitation and air temperature (QPT)). Specifically, we test Quantile Regression Forest (QRF), a non-parametric, multivariate approach based on Random Forests. We trained independent models for the two nested and partially glaciated catchments, Vent (98 km²) and Vernagt (11.4 km²), in the Upper Ötztal in Tyrol, Austria (1891 to 3772 m a.s.l.), where available QPT records 25 start in 1967 and 1975. To assess temporal extrapolation ability, we used two 2-year SSC datasets at gauge Vernagt, which are almost 20 years apart for a validation. For Vent, we performed a five-fold cross-validation on the 15 years of SSC measurements. Further, we quantified the number days where predictors exceeded the range represented in the training dataset, as the inability to extrapolate beyond this range is a known limitation of QRF. Finally, we compared QRF performance to sediment rating curves (SRC). We analysed the modelled sediment

which are almost 20 years apart for a validation.For Vent, we performed a five-fold cross-validation on the 15 years of SSC measurements.Further, we quantified the number days where predictors exceeded the range represented in the training dataset, as the inability to extrapolate beyond this range is a known limitation of QRF.
Finally, we compared QRF performance to sediment rating curves (SRC).We analysed the modelled sediment export time series, the predictors and glacier mass balance data for trends (Mann-Kendall test and Sen's slope estimator) and step-like changes (using the widely applied Pettitt's test and a complementary Bayesian approach).
Our validation at gauge Vernagt demonstrated that QRF performs well in estimating past daily sediment export (Nash-Sutcliffe efficiency (NSE) of 0.73) and satisfactory for SSC (NSE of 0.51), despite the small training dataset.
Temporal extrapolation ability of QRF was superior to SRC, especially in periods that contained high SSC events, which demonstrated the ability of QRF to model threshold effects.Days with high SSC tended to be underestimated, but the effect on annual yields was small.Days with predictor exceedances were rare, indicating a good representativity of the training dataset.Finally, the QRF reconstruction models outperformed SRC by about 20 percent-points of explained variance., to gain a comprehensive overview of sediment dynamics.In Vent, daily QPT records are available since 1967, alongside 15 years of SSC measurements.At gauge Vernagt, QPT records started in 1975 in hourly resolution, which allows comparing model performances in hourly and daily resolution (Validation A).Challengingly, only four years of SSC measurements exits at gauge Vernagt, yet consisting of two 2-year datasets, that are almost 20 years apart, which provides an excellent opportunity for validating the model's ability to reconstruct past sediment dynamics (Validation B).
As a second objective, we aim to assess changes in sediment export by analyzing the reconstructed time series for trends (using Mann-Kendall test and Sen's slope estimator) and step-like changes (using two complementary change point detection methods, the widely used Pettitt's test and a Bayesian approach implemented in the R package 'mcp').
Our findings demonstrate that QRF performs well in reconstructing past daily sediment export (Nash-Sutcliffe efficiency of 0.73) as well as the derived annual sediment yields (Validation B), despite the small training dataset.
Further, our analyses indicate that the loss of model skill in daily as compared to hourly resolution is small (Validation A).We find Ssignificant positive trends in the reconstructed annual suspended sediment yields were found at both gauges, with distinct step-like increases around 1981.This was linked to increased glacier melt, which became apparent through step-like increases in discharge at both gauges as well as change points in mass balances of the two largest glaciers in the Vent catchment.We This coincides with a crucial point in glacier melt dynamics: we find co-occurring change points in annual and summer mass balances of the two largest glaciers in the Vent catchment.This is also reflected in a coinciding step-like increase in discharge at both gauges as well as a considerable increase in the accumulation area ratio of the Vernagtferner glacier.We identifiedy exceptionally high July temperatures in 1982 and 1983 as a likely cause, as July is the most crucial month with respect to firn and ice melt..In contrast, we did not find coinciding change points in precipitation.Opposing trends at the two gauges after 1981 suggest different timings of 'peak sediment'.We conclude that, given large enough training datasets, the This study demonstrates that the presented QRF approach is a promising tool with the ability to deepen our understanding of the response of high-alpine areas to decadal climate change.This in turn will aid estimating future changes and preparing management or adaptation strategies.

Introduction
Sediment production rates per unit area are highest in the world's mountains (Hallet et al., 1996), headed by modern glaciated basins (Hinderer et al., 2013).As a consequence, sediments transported from these rapidly deglaciating high alpine areas have substantial effects on downstream hydropower production and reservoir sedimentation (Schöber and Hofer, 2018;Guillén-Ludeña et al., 2018;Li et al., 2022), flood hazard (Nones, 2019; Brooke et al., 2022) as well as water quality, nutrient and contaminant transport and aquatic habitats (Gabbud and Lane, 2016;Bilotta and Brazier, 2008;Vercruysse et al., 2017) and impact global sediment and biochemical balances (Herman et al., 2021).Thus, there is considerable interest in water resource research and management to gain better understanding of sediment dynamics in high alpine areas, also to mitigate and adapt to future changes.
However, there is still limited quantitative understanding of sediment transport in high-alpine rivers and their relation to changes in climatic forcing over relevant temporal scales relevant to investigating changes associated with anthropogenic climate change, i.e. at decadal and centennial scales as opposed to longer ones (Huss et al., 2017;Antoniazza and Lane, 2021;Herman et al., 2021).This is partly owed to the complexity of sediment dynamics in high-alpine areas, which are the result of an intricate system of climatic forcing and hydrogeomorphological processes (Costa et al., 2018;Vercruysse et al., 2017;Zhang et al., 2022).
Yet the magnitude of these impacts is catchment-specific, as it depends e.g. on the area occupied by glaciers and permafrost or basin hypsometry (Huss et al., 2017) and is thus not easily transferable from one site to another.
Aggravatingly, high-alpine sediment dynamics are highly variable over time, so that long time series are required to assess systematic changes.Yet, most records are too short for such analyses and long-term enough data are extremely rare, especially in glaciated headwaters, which are often especially challenging to monitor.
To our knowledge, only very few examples of decadal sediment records from the European Alps exist in the current literature, as opposed to their availability for e.g.High-mountain Asia (Singh et al., 2020;Li et al., 2020Li et al., , 2021;;Zhang et al., 2021), the Andes (Vergara et al., 2022) or the Arctic (Bendixen et al., 2017a) (for an extensive review, see (Zhang et al., 2022).. Costa et al. (2018b) report on an exceptional record of suspended sediment concentrations from the Upper Rhône basin, Switzerland, of almost five decades, albeit these recordings are severely affected by anthropogenic impacts (hydropower generation and gravel mining) and integrate over an area of 5340 km².Michelletti and Lane (2016) and Lane et al. (2017) reconstructed coarse sediment export from hydropower intake flushings at decadal scales in three small catchments in the Hérens Valley, Switzerland, however, not taking into account the amount of suspended sediment transport, which is often at least as large as the amount transported as bedload (Schöber and Hofer, 2018;Mao et al., 2019;Turowski et al., 2010).Further long-term sediment records can be inferred from sediment stratigraphy (e.g., (Bogen, 2008;Lane et al., 2019)yet such studies are of course confined to catchments where lakes or reservoirs are present.To compensate for this lack of measurement data, we aim to estimate longer-term past suspended sediment dynamics based on the available shorter records of suspended sediment concentrations.
Quantile Regression Forest (QRF) (Meinshausen, 2006) represents an approach that has been successfully applied to modelling suspended sediment concentrations in past geomorphological studies, in badland-dominated catchments in Spain (Francke et al., 2008a, b) and in a tropical forest in Panama (Zimmermann et al., 2012).QRF is a multivariate, non-parametric regression technique based on Random Forests, from the category of machine learning (ML) approaches, which generally seek to identify patterns from complex data (Tahmasebi et al., 2020).
Comparative studies have reported that QRF performs favorably compared to sediment rating curves and generalized linear models (Francke et al., 2008a) and that performance of Random Forest (which QRF are based on) was superior to support-vector machines and artificial neural networks (Al-Mukhtar, 2019) in modelling suspended sediment concentrations.As an advantage to other ML approaches, QRF allow to quantify the model uncertainty through estimating prediction accuracy (Francke et al., 2008a;Al-Mukhtar, 2019).
Thus, the first objective of this study was to extensively test QRF as an approach for estimating past suspended sediment dynamics at decadal scales in high alpine areas.Previous studies have included proxies for drivers of sediment transport and erosive processes, e.g.precipitation, discharge, seasonality and antecedent conditions (Francke et al., 2008a, b;Zimmermann et al., 2012).We built on this setup and adapted it by including air temperature as a predictor, since many processes relevant to sediment dynamics in high alpine areas are temperature-sensitive (e.g.snow-and glacieral melt, thawing of topsoil, etc.).To assess model performance, we evaluated several validations and compared the results to sediment rating curves based on data from the two (Tahmasebi et al., 2020) (Francke et al., 2008b, a;Zimmermann et al., 2012) gauges "Vent Rofenache" and "Vernagt" that are located in a nested catchment setup in the Rofental, within the Upper Ötztal in Tyrol Austria.
This nested setup provides a favorable opportunity to test the QRF model and gives a good overview of sediment dynamics in this catchment.
Thus, the first objective of this study is to test a quantile regression forest approach for the reconstruction of suspended sediment dynamics at decadal scales.Quantile regression forests (QRF) (Meinshausen, 2006) are a multivariate non-parametric regression technique based on random forests, that have performed favorably to sediment rating curves and generalized linear models in modelling suspended sediment concentrations (Francke et al., 2008a).As an advantage to other machine-learning approaches, QRF allow to quantify the model uncertainty (Francke et al., 2008a;Al-Mukhtar, 2019).In past studies, QRF have been successfully applied for sedigraph reconstruction in badland-dominated catchments in Spain (Francke et al., 2008a, b) and in a tropical forest in Panama (Zimmermann et al., 2012), specifically by including proxies for erosive processes, e.g.precipitation, discharge and seasonality.We adapted this approach to high alpine catchments by adding air temperature as a predictor in addition to discharge and precipitation.We then applied it to the gauges Vent Rofenache and Vernagt in a nested catchment setup in the Rofental, located in the Upper Ötztal in Tyrol Austria.The data situations at the two gauges bear different challenges and opportunities, and taken together give a good overview of sediment dynamics in this location.
The second objective of this study is to examine the resulting estimates of annual suspended sediment yields with respect to changes at decadal scales.Thus, we analyzed the time series with respect tofor trends, some of which could be expected e.g.due to ongoing temperature increase.However, the possibility of sudden, tipping-point-like shifts in response to climatic changes has been suggested for cryospheric geomorphic systems as well (Huggel et al., 2012) and especially sediment dynamics (Vercruysse et al., 2017) and indeed a step-like increases in has been observed on suspended sediment concentrations in the Rhône basinhave been observed in other catchments (Costa et al., 2018;Li et al., 2020Li et al., , 2021;;Zhang et al., 2021).Hence, we used change point detection methods to assess whether the detected trends are gradual or follow a step-like pattern.We extended theise analyseis to the predictors (temperature, discharge and precipitation) as well as annual mass balances to assess possible reasons for changes in suspended sediment yields.

Study area
The study area is located in the "Rofental", a valley in the Upper Ötztal in the Tyrolean Alps, Austria.For more than 100 years, the area has been subject to intense glaciological and hydrometeorological research , yielding outstanding data sets (Strasser et al., 2018).The entire catchment is 98.1 km² upstream gauge "Vent Rofenache" (hereafter "Vent") and nested within is the 11.4 km² catchment upstream gauge "Vernagt" (see FigFig. 1).The gauge Vent is run by the Hydrographic Service of Tyrol; and gauge Vernagtferner by the Bavarian Academy of Sciences and Humanities.The catchments' elevations range from 1891 m a.s.l. at gauge Vent and 2635 m at gauge Vernagt to the summit of Wildspitze, the highest peak in Tyrol, at 3772 m.
The study area's shielded location in the inner Alpine region leads to the relatively warm and dry climate (Kuhn et al., 1982).Mean annual temperature and precipitation at the gauge in Vent are 2.5°C and about 660 mm (Hanzer et al., 2018;Hydrographic yearbook of Austria, 2016;Strasser et al., 2018), but a considerable precipitation gradient with elevation of about 5 % per 100 m has been described (Schöber et al., 2014).
Both catchments are heavily glaciated, with approximately 28 % and 64 % glacier cover in 2015 (Buckel and Otto, 2018).The two largest glaciers within the Vent catchment, Vernagtferner and Hintereisferner have been systematically observed since the 1950s and 60s and have shown accelerating retreat since the beginning of the 1980s (Abermann et al., 2009), which is expected to continue in the future (Hanzer et al., 2018).
The catchment is drained by the river Rofenache, which flows into the Ötztaler Ache, one of the largest tributaries to the river Inn.The hydrological regime (glacial to nival) shows a pronounced seasonality as most of the discharge is fed by snow and glacier melt and about 89 % of the discharge in Vent occurs from April to September (Schmidt et al., 2022b;Hanzer et al., 2018).
The Ötztal Alps are part of the Ötztal-Stubai massif within the crystalline central Eastern Alps and the catchment geology is dominated by biotite-plagioclase, biotite and muscovite gneisses, variable mica schists and gneissic schists (Strasser et al., 2018).The land cover of higher elevations is dominated by glaciers, bare rock or sparsely vegetated terrain whereas mountain pastures and coniferous forests are present at lower altitudes.
Suspended sediment concentrations at the gauge in Vent showed to be the highest in an Austria -wide comparison (Lalk et al., 2014) and annual suspended sediment yields are about 1500 t/km² on average (Schmidt et al., 2022b).
Suspended sediment dynamics showed an even more pronounced seasonality compared to discharge, as 99 % of the annual suspended sediment yields are transported from April to September (ibid.).(Land Tirol, 2016), glacier inventory 4 (2015) (Buckel and Otto, 2018), rivers from tiris open government data (Land Tirol, 2021).

Methods
In essence, we trained a non-parametric model on suspended sediment concentrations and predictors (discharge, temperature, precipitation) and then used this model and long-term records of the predictors to reconstruct past suspended sediment concentrations (SSC) and derive annual suspended sediment yields.We then analyzed the resulting time series with respect to trends and change points.(Schmidt et al., 2022b) (Schmidt et al., 2022b) In the following section, we will briefly describe the general modeling approach and the validations, the Quantile Regression Forest (QRF) approach and the employed predictors, the input data and their preparation for the model, the general idea of QRF, the construction of ancillary predictors and our modifications, the modelling structure we applied and our validations as well as the statistical tools used for the analysis of the results.
In our analyses, we focus on (annual) suspended sediment yields instead of concentrations, due to the very strong seasonality in discharge (Schmidt et al., 2022b).This gives more weight to days with higher discharge, which are more influential for overall sediment export to downstream areas and the potential resulting problems, as opposed to e.g.mean annual SSC, which weigh days at the beginning and end of the season, with low discharge and low concentrations, and days during the glacier melt period equally.We use the term 'reconstructing' to describe the estimates of SSC (or the resulting yields) from our model simulations.

General modeling approach and adaptations to conditions at the two gauges
We combined the analysis of the two gauges Vent and Vernagt to gain a more reliable and comprehensive understanding of past sediment dynamics in the Rofental.In this, the data situations at the two gauges bear different challenges and opportunities.
At gauge Vent, continuous turbidity-derived SSC time series have been recorded in high temporal resolution (15 minutes) since 2006, providing abundant training data for our model.Additionally, the long -term predictor data are available back until 1967, facilitating insights into long-term changes in catchment dynamics -yet only in daily resolution, which predetermines the temporal resolution of the reconstruction model.This is challenging as sediment concentrations vary considerably during one day, leaving us with the need to assess whether a daily model adequately represents sediment dynamics.
At gauge Vernagt, the availability of hourly discharge (Q), precipitation (P) and temperature (T) data back until 1974 is remarkable.Yet, turbidity-derived SSC data have only been recorded for the years 2000, 2001, 2019 and 2020 (see Fig. 2, upper panel "Vernagt", left side, plots labelled "SSC").Additionally, the data in 2019 and 2020 are affected by episodic siltation and periods when the turbidity sensor reached satura tion: 0.47 % of the two summers of data were affected by saturation and about 8 % were affected by siltation.The latter was mainly due to one period of about 16 days in August 2019.Additionally, there were three shorter incidents (1.5 hours to 1.5 days), two in 2019 and one in 2020.These issues first need to be dealt with in order to provide accurate training data for our model, as it is sensitive to the range of values represented in the training data (Francke et al., 2008a).
On the other hand, the 16-year gap between the measurement periods provides the rare opportunity to verify the model's skill in estimating past SSC against real measurement data.
To address these issues and benefit from these opportunities, we preformed three preparatory steps before the final reconstruction models (Fig. 2): 1. Gap-filling model: At gauge Vernagt, we trained a model on SSC determined from 131 water samples and the predictors (Q, P, T) in the highest possible (i.e.10-minute) resolution (Fig. 2, upper panel).We used the resulting modelled SSC to replace periods in the 2019/20 SSC data that were affected by siltation or sensor saturation.As the sampling scheme was customized to cover hydro-sedimentological conditions as widely as possible (see Schmidt et al., 2022), the water samples also partially cover the periods to be addressed by the gap filling.Finally, we ran the 'reconstruction models' in daily resolution (for consistency between the two gauges), taking into account all available training data to estimate SSC back until 1967 and1975, respectively.Beyond Validation A and B, we assessed whether and how often the ranges of the predictors in the training data were exceeded during the reconstruction period, since it is a known limitation of QRF that it is not possible to extrapolate beyond the range of values represented in the training data (Francke et al., 2008a).For example, if the discharge measured on a day in the reconstruction period exceeds the maximum discharge within the training period (Qtrain, max), the model potentially underestimates SSC on that day.
Additionally, we compared QRF performance to performance of sediment rating curves, one of the most commonly used approaches to estimate suspended sediment concentrations (Vercruysse et al., 2017), by fitting a power function between SSC and Q of the form  =  •   , where a and b are regression coefficients.

Quantile Regression Forest (QRF) for modelling suspended sediment concentrations
To reproduce suspended sediment dynamics, the desired models have to account for a multitude of processes controlling SSC (discharge dynamics, temperature-dependent activation of sediment sources and transport processes, precipitation, antecedent wetness conditions, etc.), need to deal with non-normal distributions of both predictor and response variables and have to handle correlations between the predictors (Zimmermann et al., 2012).
Quantile Regression Forests (QRF) (Meinshausen, 2006) represent a multivariate approach that can deal with nonlinearity, interactions and non-additive behavior without making assumptions on underlying distributions, and performed favorably in reproducing sediment dynamics as compared to generalized linear m odels or sediment rating curves (Francke et al., 2008a).QRF are a generalization of Random Forests (RF) regression tree ensembles (Breiman, 2001).Regression trees (a.k.a.CARTs) (Breiman et al., 1984) apply recursive rule-based data partitioning in order to group data with similar values for the response variable (Francke et al., 2008a).To produce a "forest", RF and QRF employ an ensemble of these trees, each one grown on a random subset (bootstrap sample) of the training data.Predictive performance is evaluated on those parts of the data that are not considered in the bootstrap sample, i.e. the out-of-bag data (OOB) (ibid.).Model predictions are then obtained from the mean of predictions of each single tree (RF) or based on the distribution of these single-tree predictions (QRF).As QRF keeps the value of all observations within a node, it enables the quantification of the model uncertainty with the help of these inherent ensemble characteristics.This represents an advantage to other non-parametric approaches applied to SSC modelling, such as traditional fuzzy logic or artificial neural networks (ibid.).
Building on the model setup developed by (Francke et al., 2008b;Francke, 2017;Zimmermann et al., 2012), we chose the predictors to represent proxies for processes important to sediment dynamics in high -alpine areas.For example, discharge is crucial for sediment transfer and, potentially, channel erosion, while precipitation drives hillslope erosion and mass wasting events.We extended the set of primary predictors by adding air temperature, as many processes determining sediment dynamics in high alpine are temperature-sensitive (e.g. the activation of sediment sources, such as the occurrence of rain vs. snow, the availability of sub-and proglacial sediments and their transport through glacier meltwaters, but also potential permafrost thaw).From these three primary predictors (discharge, precipitation and temperature, QPT), we derived information on antecedent conditions for each time step by computing ancillary predictors derived thereof.These ancillary predictors can capture that, e.g., longerterm discharge behavior is linked to potential exhaustion of sediment sources or storage, prolonged warm periods may lead to increased glacier ablation associated with intensified transport of glacial sediment, and high antecedent moisture conditions prior to a precipitation event may favor mass movements.To keep correlation between these derived predictors as low as possible, we used non-overlapping windows of increasing sizes (e.g.24 hours, 24 -72 hours, 72 hours to 7 days and 7 days to 20 days ahead of each time step) to compute sums of the primary predictors (Zimmermann et al., 2012)Zimmermann et al., 2012.To complete the set of predictors, we used the day of year to capture the pronounced seasonality in high alpine sediment dynamics (Schmidt et al., 2022b), as well as the rate of change in discharge (see also Francke et al., 2008b;Zimmermann et al., 2012).
Furthermore, we tuned the model with respect to the length of the antecedent periods considered: We optimized the length of the time windows for the highest model performance with respect to daily / hourly SSC using the Nash-Sutcliffe efficiency index (see section 3.2.1).(Murphy, 2012) of temporally-close points in timeBesides the length of the antecedent periods, the two most important hyper-parameters for the QRF are the number of trees in a "forest" and the number of selected predictors at each node, implemented as parameter "mtry".To increase robustness, we set the number of trees to 1000, which is twice the default value.The parameter "mtry" is optimized for maximum performance in the modelling process (and hardly sensitive).Additionally, we optimized model performance in a cross validation with five folds, as is commonly done (Murphy, 2012).However, unlike cross validation approaches in the classical sense, which randomly pick a set of individual data points, we divided the training data into five equal temporally contiguous chunks to avoid unrealistically good performance simply due to autocorrelation of temporally-close points in time.Finally in the 'reconstruction model', we derived annual suspended sediment yields by performing 250 Monte-Carlo realizations of the annual SSY for each year, which allows assessing the prediction uncertainty, from which the mean and quartiles of annual suspended sediment yields were computed.We chose the number of 250 Monte-Carlo realizations as it yields sufficiently robust estimates of the mean annual SSY (the confidence intervals of the mean are ca.± 1.25 % of the mean) while keeping computation times reasonable.

Characteristics, sources and adjustments of input data
Here we describe the input data and their preparation for modelling.An overview of dDetails such as coordinates of stations and gauges, the temporal resolution of the different time series, their sources and data availability can be found in table A1 in the appendix A.

Discharge data, precipitation and temperature data
Gauge Vent has been operated continuously by the Hydrographic Service of Tyrol since 1967 (Strasser et al., 2018) and discharge has been measured at 15 minute resolution through water stage recordings, complemented by a radar probe since 2000.At gauge Vernagt, discharge has been recorded since 1974 (Bergmann and Reinwarth, 1977) and is determined through water stage recordings and tracer-calibrated stage-discharge relations.
Additionally, this is complemented by ultrasonic stage measurements, providing nearly uninterrupted discharge series since 1974 (Braun et al., 2007).The temporal resolutions of the discharge time series at both gauges varies over time (from five and ten to 60 minutes, for details see table A1 in the appendix).
Precipitation and temperature have been recorded close to the gauge in Vent (see map Fig. 1) since 1935 and are available in at daily resolution.Both time series showed gaps, which creates a problem when using the chosen QRF approach.Thus, towe filled these gaps using in the precipitation data time series, we used data recorded at the gauge Vernagt gauge: We derived a linear model between all days when daily precipitation sums were available from both stations and used the resulting linear model for conversion (i.e.PVent = PVernagt/1.3).Likewise, (aggregated to daily sums and using a correction factor of PVent = PVF/1.3obtained from concomitant periods) and temperature data recorded at Martin-Busch-Hütte (see Fig. 1) were aggregated to daily means and used to fill gaps in the temperature time series (see fig. 1) (aggregated to daily means and using a correction factor of TVent = TMB * 2.8089).As a result, 2 % of the precipitation and 0.25 % of the temperature time series were filled (see Figfig. A1 in the appendix).
At gauge Vernagt, precipitation and temperature have been recorded in high temporal resolution (5 to 60 minutes, for details see table A1  A1 in the appendix).Some gaps still remain, but these, too, are mostly restricted to winters.We excluded the data from 1974 from the analyses at gauge Vernagt, as data were not available for the entire year.

Turbidity and suspended sediment concentration data
At gauge Vent, turbidity has been measured since 2006 using two optical infrared turbidity sensors (Solitax ts-line and Solitax hs-line by Hach) since 2006.To calibrate the turbidity measurements to suspended sediment concentrations (SSC), water samples are taken manually from the stream close to the turbidity sensors frequently (Lalk et al., 2014).Turbidity measurements are paused every winter (between October and April) to avoid damage to the equipment.However, the equipment is reinstalled early enough to capture the spring rise in concentrations, and winter sediment transport can be considered negligible (Schmidt et al., 2022b).
At gauge Vernagt, water is diverted into a measuring chamber (Bergmann and Reinwarth, 1977), where turbidity can be recorded while avoiding damage to the equipment by large rocks in the main channel.

Aggregation/disaggregation to different temporal resolutions
As temporal resolutions varied between the different time series, we aggregated or disaggregated the data to achieve homogenous temporal resolutions for the respective QRF models in daily, hourly and 10-minute resolution.Data in 10-minute resolution were only needed for the gap-filling model at gauge Vernagt.For this, we had to disaggregate precipitation and temperature data from 60-minute resolution in 2000 and 2001, by dividing hourly precipitation sums by 6 and replicating mean hourly temperature values for the six corresponding 10 minute time steps.
For the analysis of annual Q, P and T, we summed up daily discharge volumes as derived from daily mean discharge (Qsum [m³/day] = 60 * 60 * 24 * Qmean [m³/s]), added up daily precipitation sums and computed annual averages of daily mean temperature.At gauge Vernagt, we only considered data between May 1 st and September 30 th of each year due to inconsistent gaps in winter temperature and precipitation measurements.
To compute annual Q, P and T values, we computed daily Q sums derived from the available daily means (Qsum [m³/day] = 60 * 60 * 24 * Qmean [m³/s]), summed up daily precipitation sums and computed annual averages of daily mean temperatures.However, due to inconsistent gaps in winter temperature and precipitation measurements at gauge Vernagt, we only considered data between May 1 st and September 30 th of each year.

Quantile regression forests (QRF) for sedigraph reconstruction
To reconstruct suspended sediment dynamics, the desired models have to account for a multitude of processes controlling SSC (discharge dynamics, temperature-dependent activation of sediment sources and transport processes, precipitation, antecedent wetness conditions, etc.), need to deal with non -normal distributions of both predictor and response variables and have to handle correlations between the predictors (Zimmermann et al., 2012).
Quantile regression forests (QRF) (Meinshausen, 2006) represent a multivariate approach that can deal with nonlinearity, interactions and non-additive behavior without making assumptions on underlying distributions, and performed favorably in reproducing sediment dynamics as compared to generalized linear models or sediment rating curves (Francke et al., 2008a).QRF are a generalization of random forests (RF) regression tree ensembles (Breiman, 2001).Regression trees (a.k.a.CARTs) (Breiman et al., 1984) apply recursive rule-based data partitioning in order to group data with similar values for the response variable (Francke et al., 2008a).RF and QRF employ an ensemble of these trees, each one grown on a random subset (bootstrap sample) of the training data.Predictive performance is evaluated on those parts of the data that are not considered in the bootstrap sample, i.e. the out-of-bag data (OOB) (ibid.).Model predictions are then obtained from the mean of predictions of each single tree (RF) or based on the distribution of these single-tree predictions (QRF).As QRF keeps the value of all observations within a node, it enables the quantification of the model uncertainty, which represents an advantage to other non-parametric approaches applied to SSC modelling, such as fuzzy logic or artificial neural networks (ibid.).
We built on the model setup developed by (Francke et al., 2008b;Zimmermann et al., 2012), who integrated antecedent conditions of the considered primary predictors over various aggregation periods.Extending this idea , we added temperature (and antecedent temperature conditions) to the group of predictors, as many processes determining sediment dynamics in high alpine are temperature-sensitive (i.e. the activation of sediment-rich glacial meltwaters).
Secondly, we tuned the model with respect to the length of the antecedent periods considered: To describe antecedent conditions of Q, P and T while keeping correlation between the derived predictors as low as possible, we used non-overlapping time windows of increasing sizes from the primary predictors Q, P and T (Zimmermann et al., 2012) and optimized the model performance with respect to the length of these time windows.We optimized the model results with respect to the Nash-Sutcliffe efficiency index (see section 3.2.1) in a 5-fold cross validation, that divided the training data into five equal temporally contiguous chunks.
To derive annual suspended sediment yields, we performed 250 Monte-Carlo realizations for each year which allows assessing the prediction uncertainty.From these, the mean and quartiles of annual suspended sediment yields are computed.

General modeling approach and adaptations to conditions at the two gauges
We combine the analysis of the two gauges Vent and Vernagt to gain a more reliable and comprehensive understanding of past sediment dynamics in the Rofental.In this, the data situations at the two gauges bear different challenges and opportunities.(Francke et al., 2008a).
On the other hand, the 16-year gap between the measurement periods provides the extraordinary opportunity to verify the model's skill in reconstructing past SSC against real measurement data.
To address these issues and benefit of the opportunities, we preformed three preparatory steps before the final reconstruction models (fig.2): 3. Gap-filling model: At gauge Vernagt, we trained a model on SSC determined from 131 water samples and the predictors (Q, P, T) in the highest possible (i.e.10-minute) resolution (fig.2, upper panel).We used the resulting modelled SSC to replace periods in the 2019/20 SSC data that were affected by siltation or sensor saturation.As the sampling scheme was customized to cover hydro-sedimentological conditions as widely as possible (see Schmidt et al., 2022), the water samples also partially cover the periods to be addressed by the gap filling.
3. Validation A -temporal resolution: we ran models in both hourly and daily resolution at gauge Vernagt to assess the error magnitude due to the coarser temporal resolution.(Francke et al., 2008a).For example, if the discharge measured on a day in the reconstruction period (Qrec) exceeds the maximum discharge within the training period (Qtrain, max), the model potentially cannot make a reliable prediction of SSC on that day.Therefore, we need to assess whether and how often the ranges of the predictors in the training data were exceeded during the reconstruction period.

Validation
where Q is discharge [m³/s], where ∆t is the corresponding temporal resolution [s] and  sums over all timesteps of the day, where A [km²] is the catchment area and  sums over all timesteps of the year.
To quantify model performance and for our validation, we used the Nash-Sutcliffe efficiency index: where 'obs' and 'mod' refer to observed and modelled SSC values and   ̅̅̅̅̅̅̅̅̅ is the mean of observed SSC values (Zimmermann et al., 2012;based on Nash and Sutcliffe, 1970).The NSE is dimensionless and ranges from -∞ to 1, and a model with NSE over 0.5 can be considered accurate (Moriasi et al., 2007;Mather and Johnson, 2014).
However, sediment export in the study area is highly seasonal (see also Schmidt et al., 2022) so that the NSE might be misleading, as models reproducing seasonality but fail to reproduce smaller fluctuations can still report a good NSE value (Schaefli and Gupta, 2007).Thus, we additionally computed the normalized benchmark efficiency as follows: where SSCbench refers to the benchmark model suspended sediment concentration at timestep i. Commonly, this benchmark model is the mean of the observations for every Julian day over all years within n (i.e. the mean annual cycle) (see, e.g., Pilz et al., 2019).(Pilz et al., 2019)However, this is heavily influenced by individual events in our case, so we used the 60-day moving average of the mean SSC for every Julian day instead.As for the NSE, a BE value of 1 corresponds to perfect agreement of simulation to measurements and a model with BE > 0 is able to reproduce dynamics better than simply using statistics (Pilz et al., 2019).

Methods for trend analysis and change point detection
In order to quantify time series behavior, we generally followed the approach of first analyzing for existence of a trend.If a trend was detected, we assessed whether the trend was homogenous by analyzing for change points.If a change point was identified, we then examined the resulting segments of the time series for trends.
Most of the investigated time series are not normally distributed and some show autocorrelation.Thus, to calculate trend significance, we used the non-parametric Mann-Kendall test for linear trend detection in a version that was modified to detect trends in serially correlated time series (Yue and Wang, 2004) as recommended by (Madsen et al., 2014).To estimate trend magnitude, we used Sen's slope (SS) estimator (Sen, 1968).Both methods are implemented in the "mkTrend" function of the R-package "FUME" (Santander Meteorology Group, 2012).In our results, we only plot and refer to trends that were significant at least at a significance level α = 0.05 after correction for autocorrelation.
For change point (CP) detection, we used the non-parametric Pettitt's test (Pettitt, 1979), which is commonly used as it is a powerful rank-based test for a change in the median of a time series and robust to changes in distributional form (Yue et al., 2012), as implemented in the R package "trend" (Pohlert, 2020).However, it only gives one change point location without uncertainty quantification around its location and was shown to be sensitive to the position of the change point within the time series, i.e. detection at the beginning and end of the series is unlikely (Mallakpour and Villarini, 2016).
Thus, as a complementary advanced approach that counterbalances the weaknesses of Pettitt's test, we used Bayesian regression with change points as implemented in the R-package "mcp" (which stands for "multiple change points"; hereafter we refer to this method as "MCP") (Lindeløv, 2020).This represents a much more flexible approach and which allows to asses uncertainty through the resulting posterior distributions of the change point location and is applied in an increasing number of studies in different fields of research (e.g.Veh et al., 2022;Yadav et al., 2021;Pilla and Williamson, 2022).Although mcp MCP allows to detect multiple change points, we only considered one change point as we aimed to detect the largest shift in the time series and to ensure comparability with Pettitt's test.Unless specified otherwise, we used the uninformative default prior, allowed free slope estimation before and after the change point, assumed a disjoined slope (i.e.step-like change) at the change point and allowed for changes in variance at the change point.
As mentioned earlier, we computed 250 Monte-Carlo realizations of the annual SSY as a result of the QRF model.
We propagated this uncertainty by applying the trend and change point detection methods not only to the mean estimates but also to the 250 resulting time series realizations.
All calculations were done in R, version 4.2.1 (R Core Team, 2018).

Validation A: Influence of reduction of temporal resolution
As described earlier, tThe temporal resolution for long-term reconstruction is limited to daily, as the respective long-term predictor data are available only in at daily resolution at gauge Vent.As a result, we can expect some loss of information, e.g. on short-term precipitation intensity, which that can be crucial for sediment dynamics, such as short-term precipitation intensity.To assess the error magnitude, we ran two variants of the models at gauge Vernagt, in daily and hourly resolution based on all available training data (i.e. 2000, 2001, 2019 and 2020).
We then compared daily sediment discharge (Qsed) calculated from the out-of-bag model estimates to Qsed calculated from measured turbidity (Figfig.3a).It is important to stress that out-of-bag (OOB) estimates of any given day represent model predictions of only those regression trees where the particular day was not part of the training data.Both models reproduced daily Qsed very well, where the daily model shows showed a larger scatter, which is also reflected in the slightly lower NSE and BE.The comparison of annual sSSY (Figfig.3b) shows showed very similar estimates in most years.To rule out that model performances were strongly influenced by discharge (which is both a predictor in the model and used to calculate Qsed), we also compared hourly and daily out-of-bag SSC instead of Qsed.Yet the resulting NSE of 0.97 and 0.82 and BE of 0.95 and 0.73 for the hourly and daily models, respectively, still represent very good model performance in general and show that the loss of model skill between the two models because of different temporal resolutions is acceptable.
As the loss of model skill and difference between the two model results are small,Thus, we used the daily resolution models at both gauges in the following analyses for better comparability and applicability to the full length of the available time series.At gauge Vent, more years of data were available, so that the results of the five-fold cross-validation provided a more detailed picture of the temporal extrapolation ability (Table 1).Some cross-validation periods showed rather low NSE values, which indicates that they are harder to predict with only the remaining training data, and therefore contain more valuable data for training the full QRF model.For example, the period 2012 to 2014 with the lowest NSE contains the most extreme event in August 2014, which was likely linked to mass movements (Schmidt et al., 2022b).Similarly, the period 2018 to 2020 also shows a low NSE and a mass-movement event was observed in 2020 (ibid.).Including such periods in the training data apparently is of importance to the fidelity of the model.
Conversely, the same cross-validation with a sediment rating-curve shows that QRF performance is much better, especially in those periods that are harder to predict.Consequently, QRF clearly excels over the traditional approach (see also section 4.6).

Exceedances of predictor ranges in the past
It is a known limitation of QRF that it is not possible to extrapolate beyond the range of values represented in the training data (Francke et al., 2008a).For example, if the discharge measured on a day in the reconstruction period (Qrec) exceeds the maximum discharge within the training period (Qtrain, max), the model potentially cannot make a reliable prediction of SSC on that day.Therefore, we need to assess whether and how often the ranges of the predictors in the training data were exceeded during the reconstruction period.
At gauge Vent, maximum Temperature of the training period (Ttrain, max) was not exceeded during the reconstruction period.Maximum discharge of the training period (Qtrain, max) was overstepped on 4 days in 1987 and precipitation within the reconstruction period was larger than the maximum precipitation during the training period (Ptrain, max) on 3 days within the reconstruction period.
At gauge Vernagt, Qtrain, max was exceeded 6 times in the reconstruction period, and 4 four of these days occurred in 2003.There was one day at gauge Vernagt, were Ttrain, max was exceeded in 2017.P train, max was exceeded on 5 days within the reconstruction period, however on 2 of these days, the temperature was below zero and discharge was very low, so we considered these days negligible for annual sediment export.
There were also days at both gauges when discharge was lower than the minimum discharge measured discharge during the training period (Qtrain, min).Yet all of these days were in April, May or October, when SSC are very low, and as very low discharge also translates to very low transport capacities, we considered the error negligible for annual sediment yields.

Performance of the reconstruction models
To test the performance of the reconstruction models in Vent, we compared out-of-bag SSC estimates of the reconstruction models and SSC estimates of sediment rating curves (trained on all available data) to turbidity measurements at both gauges (Fig. 5).
QRF performance is superior to sediment rating curves (SRC) at both gauges with respect to mean daily SSC: as can be seen in Fig. 5 and the NSE and BE values, SRC are generally less capable of capturing the variability in the training data.The NSE and BE values of the QRF model represent a satisfactory performance at gauge Vent and very good performance at gauge Vernagt (Moriasi et al., 2007).Considering mean annual SSC at gauge Vent, QRF performs much better than SRC.At gauge Vernagt, QRF performs very well and slightly better than SRC, but there are only 4 years available for comparison, which confines the representativity of this analysis.
Performance with respect to Qsed is slightly better for both models as compared to the comparison against SSC.
SRC estimates yield values for NSE of 0.73 and BE of 0.61 at gauge Vernagt and NSE of 0.49, and BE of 0.35 at gauge Vent.Still, QRF performance is superior (NSE of 0.89 and BE of 0.84 at gauge Vernagt and NSE of 0.64 and BE of 0.54 at gauge Vent).For both models, performance is better at gauge Vernagt than at gauge Vent.
plotted daily out-of-bag Qsed predictions and Qsed from turbidity (fig.5a) as well as modelled and measured annual sSSY (fig.5 b).This shows that rareRare days with extremely high Qsed SSC (and Qsed above ca.7000 t/day) are systematically underestimated by both models..However, it is important to emphasize that Fig. 5 shows out-of-bag (OOB) predictions, i.e. the QRF model predictions for these "extreme" days if the particular day in question is not part of the training data.To quantify the effect of this underestimation of daily Qsed on annual SSY for the QRF-model, we calculated the difference between measured and estimated daily Qsed (OOB) for the 10 days with the highest Qsed in the turbidity time series at both gauges.The differences correspond to 0.6 to 2.8 % of the annual SSY at gauge Vernagt, and 1.7 to 19.1 % of the annual SSY at gauge Vent.However, the 19.1 % underestimation are attributable to the most extreme event in the time series, where 26 % of the annual SSY were exported within 25 h in August 2014, likely in association with a mass movement event (see Schmidt et al., 2022).
The full model estimate used asof the 'reconstruction model' (i.e.not OOB estimates) for this day shows an underestimation of only 6 %.
The NSE of modelled vs. measured Qsed of 0.64 represents a satisfactory performance (n = 2760) as does the BE of 0.54.This is much lower than the NSE of 0.89 and BE of 0.84 of the daily model at gauge Vernagt (fig.3a).

Annual sSSY and their development over time
In the resulting time series, the average sSSY of all years (± 1 standard deviation) is 1401 (± 453) t/km²/yr at gauge Vent and 1383 (± 668) t/km²/yr at gauge Vernagt.This indicates overall similar magnitudes of sediment export per catchment area, yet with much higher variability at gauge Vernagt.To assess how suspended sediment dynamics changed over time, we analyzed the time series of annual sSSY at the two gauges for trends and change points.
At gauge Vent, Pettitt's test yields a significant change point in 1981 in annual sSSY, which is also true for 99.6 % of the realizations (for 1 realization, the change point was detected in 1980).Accordingly, the MCP mcp analysis  We derived mean annual temperatures at gauge Vent, however, at gauge Vernagt, we computed mean summer temperatures (between May and September) instead, as temperatures are missing in winter in many years (see also

Discussion
In the following, we will briefly summarize the most important results, before we discuss weaknesses and strengths of the presented work and integrate the results in the broader context of the literature.
In our model evaluation, we found that QRF models in reduced temporal resolution (daily as compared to hourly) still produced satisfactory results (validation A) and performed well in reconstructing past Qsed and sSSY (validation B).In the reconstructed time series of annual sSSY, we detected positive trends at both gauges with high probabilities of step-like changes around 1981.This coincides with a critical point in the mass balances of the two largest glaciers in the catchment area, as well as high change point probabilities in discharge volumes and July temperatures at both gauges.With respect to precipitation, we found some positive trends but no clear change points.In the time after 1981, we found opposing trends at the two gauges, with decreasing sSSY at gauge Vent and increasing sSSY at gauge Vernagt.

Part I: Model evaluation
The presented study aimed to examine extensively, whether Quantile Regression Forest is a suitable method for estimating past sediment export.Overall, the final QRF reconstruction models outperformed sediment rating curves (SRC) by about 20 percent-point in explained variance at both gaugeseven though we assessed out-ofbag (OOB) estimates of QRF versus rating curves based on all available data.OOB performance was better at gauge Vernagt (NSE of 0.82 with respect to daily SSC) than at gauge Vent (NSE of 0.6), although even the latter corresponds to a satisfactory performance (Moriasi et al., 2007).We suggest that there are three reasons for this difference in performance.First, the Vent catchment is much larger (almost 100 km²) than the 11.4 km² Vernagt catchment.Thus, suspended sediment dynamics at gauge Vent integrate over more different processes, which adds to the complexity compared to the mostly melt-driven Vernagt catchment.Second, precipitation measurements at gauge Vent are unlikely to be representative of the entire catchment, due to the catchment size and especially due to topographical effects.Thus, more localized rainstorms are likely to be captured poorer.And third, SSC in Vent reach much higher values than at gauge Vernagt, so that uncertainties in the turbidity measurement are more relevant: at low concentrations, the light emitted by the turbidity sensor is predominantly scattered by solid particles, while at high concentrations, absorption becomes the dominant process.This causes the relation between the light detected by the photo sensor in the turbidity probe and SSC to become non -linear (Merten et al., 2014) and leads to a much higher variance in the SSC-turbidity relationship.However, despite these effects and the lower NSE with respect to daily Qsed at gauge Vent, annual sSSY show very good agreement.Thus, we conclude thatgiven the availability of a large enough and varied training dataset and sufficiently long records of the predictors -QRF represents a reliable tool with the ability to broaden our understanding of the response of high-alpine areas to climate change in the past decades.
Nevertheless, a few limitations need to be considered.As a major limitation, QRF cannot extrapolate if predictors in the reconstruction period exceed the range of values represented in the training dataset (Francke et al., 2008a).
In these cases, the modelled values will potentially suffer from over-or underestimation.At the same time, assessing the number of such exceedances in the period of model application provides an indication of the representativity of the training data set, which must be sufficiently large and varied for such data-driven approaches (Vercruysse et al., 2017).In our case, exceedances were rare (seven days at gauge Vent and 10 days at gauge Vernagt), so we consider the error negligible for annual sediment export.We strongly encourage future studies to assess the number and severity of such exceedances for this purpose, especially in view of changing conditions, e.g.due to (prospective) climate change.
More generally, the QRF model can only reproduce the quantitative and qualitative conditions as represented within the training data.This has several implications.First, the spread of the QRF model results needs to be interpreted as a minimum estimate of uncertainty, as it can only reproduce the "known unknown", i.e. the variability represented in the training data and their respective relationships or the lack thereof.And second, major qualitative changes in the functioning of the modelled system, such as changes in connectivity e.g. through the formation of proglacial lakes or large-scale storage of sediments along the flow paths, cannot be captured.
However, for the catchments considered in this study, such major changes are at least not indicated in historic aerial images (Laser-und Luftbildatlas Tirol, 2022) and the longitudinal profile of the major water courses are very steep, which precludes significant sediment storage.
We found that our models at both gauges underestimated rare, high daily SSC and Qsed (figures 3 -5).This is not surprising given that these events are rare and that figures 3 to 5 show out -of-bag predictions, which means that the respective estimates are based only on those trees that have seen few or none of these conditions.Including them in the training of the final reconstruction model alleviated this effect.
Further, we suspected that the aggregation of precipitation and discharge to daily values might involve some loss of information e.g. on sub-daily precipitation intensity and maximum discharge, which would very likely affect sediment export estimates.Indeed, the underestimation is a little more pronounced in the daily model as compared to hourly resolution, yet the difference was relatively small (Validation A, Fig. 3 a).Adding to this, the underestimation at the daily scale does not seem to propagate to annual estimates, as high annual sSSY are not systematically underestimated to the same extent (Fig. 5b) and the underestimation of rare events at the daily scale has a limited effect on the annual estimates (section 4.4).This is in accordance with the finding that only about ¼ of the annual yield in Vent is transported during (precipitation) events (Schmidt et al., 2022b), as opposed to other fluvial systems where the majority of the annual sediment yield is transported by several extreme events (Delaney et al., 2018).Consequently, the proposed QRF model is also applicable at in daily resolution, enabling its application to the longer time series available.
We found that extraordinarily high daily Qsed are underestimated by our models at both gauges (figures 3 -5), which is a little more pronounced in the daily model as compared to hourly resolution (fig.3 a).This is not surprising given that these events are rare and figures 3 to 5 show out-of-bag predictions, which means that the respective estimates are based on trees that have seen even fewer or none of these conditions.Especially in Validation B, the validation model was trained on very few data.(Delaney et al., 2018)As another aggravating factor, the aggregation of precipitation and discharge to daily values involves some loss of information e.g. on sub-daily precipitation intensity and maximum discharge, which very likely affect sediment export.Yet as Validation A showed, this loss of information is relatively small (fig.3 a).Additionally, this underestimation at the daily scale does not seem to propagate to annual estimates, as high annual sSSY are not systematically underestimated to the same extent (fig.5b).(Schmidt et al., 2022b) We In interpreting these results, it has to be considered that the amount of data used for training is very small and less than half of the final reconstruction model (212 of 579 days), while the amount of training data is known to be crucial in data-driven models (e.g.Vercruysse et al., 2017).Specifically, turbidity recordings only started in mid-July 2019, so that only one spring season was available for training.Thus, given the rigor ousness of this test as well as the temporal distance of 20 years, we find that the validation yields satisfactory results with good agreement The overall magnitudes of annual yields at the two analyzed gauges fall at the high end compared to an extensive collection from the European Alps (Hinderer et al., 2013) and are in good accordance with yields from other catchments in the Stubai and Ötztal Alps (Schöber and Hofer, 2018;Tschada and Hofer, 1990) (see also Schmidt et al., 2022).Their reconstruction for the past five decades constitutes an important contribution to the understanding of long-term sediment budget, as other existing records commonly only cover some decades.
The reconstructed annual sSSY at both gauges show overall positive trends and change points around 1981.Indeed, the two change point detection methods did not agree at gauge Vernagt, where MCP yielded high change point probabilities around 1981, whereas Pettitt's test detected a significant change point in 1989 (or 2002 in some realizations).Similarly, the results did not agree for July temperatures and discharge at gauge Vernagt.We attribute this to a limitation of Pettitt's test, which is known to be much less sensitive to break points located near the beginning or end of the time series (Mallakpour and Villarini, 2016).Thus, we conclude that there is a high probability of change points around 1981 in the reconstructed sSSY as well as discharge and July temperature time series at gauge Vernagt, as the MCP probability distributions are very narrow and similar to the corresponding probability distributions in Vent.The overall magnitudes of annual yields at two gauges fall at the high end compared to an extensive collection from the European Alps (Hinderer et al., 2013) and are in good accordance with yields from other catchments in the Stubai and Ötztal Alps (Schöber and Hofer, 2018;Tschada and Hofer, 1990) (see also Schmidt et al., 2022).
Our results suggest that the step-like increase in modelled sSSY is linked to the onset of increased ice melt.Mean  1982 and 1983.July temperatures are especially relevant for firn and glacier melt, since July is the month with the highest firn and ice melt contribution to discharge, after snow melt contributions have peaked in June and snow cover has decreased substantially (Kormann et al., 2016;Weber and Prasch, 2016;Schmieder et al., 2018;Schmidt et al., 2022b).This shift is reflected in discharge, which shows a step-like increase around 1981 at both gauges and continues to be elevated after this change point.The analyses of monthly discharge showed that July was the only month with a change point around 1981 at both gauges, which again emphasizes the dominance of increased firn and ice melt.In contrast, we did not find evidence for a change in precipitation sums around 1981, which would indicate that enhanced hillslope erosion on snow-or ice-free surfaces played a crucial role in the sSSY increase.With regard to suspended sediment dynamics, it is conceivable that the increase in ice melt trans lates to an increase in sediment-rich glacial meltwater (Delaney and Adhikari, 2020) as well as intensified fluvial erosion of sediment-rich proglacial areas.
These changes in the predictors of our QRF model are also reflected in the mass balance time series of the Hintereisferner and Vernagtferner, which were almost exclusively negative after 1980, with very negative summer mass balances in 1982and 1983. Escher-Vetter (2007) and Abermann et al. (2009) have also shown that mass loss started in 1981 at both glaciers, resulting from negative summer balances.With respect to the Vernagtferner, the drastically higher ablation area ratio of almost 80 % in 1982 as compared to around 25 % in the preceding years (Escher-Vetter and Siebers, 2007) indicates that large areas of the glacier became snow-free in 1982.This entails crucial changes in albedo and therefore intensified ice melt and thinning of firn areas due to rising energy absorption at the glacier surface (Escher-Vetter, 2007;Braun et al., 2007).We interpret this as a regime shift as summer mass balances continue to be lower than before 1981 although (July) temperatures decrease again, with the exception of 1984, which was characterized by a relatively high number snowfall days during the ablation period (Escher-Vetter and Siebers, 2007).
Similar step-like increases in sediment export have been reported from the upper Rhône basin in Switzerland (Costa et al., 2018), (only slightly later, (around 1985,), which is likely due to lower average temperatures in the Rhône basin as compared to the upper Ötztal), as well as for the headwaters of the Yangtze River on the Tibetan plateau (Li et al., 2020;Zhang et al., 2021) and 28 headwater basins in High Mountain Asia (Li et al., 2021).Thus, we deem it unlikely that the change points identified in the present study are related e.g. to a change in measurements, also because they were detected at both gauges in the same year and in both (July) temperature and discharge.More generally, substantial increases in sediment export in response to climate change have been reported from cold environments around the globe (e.g.Bogen, 2008;Koppes et al., 2009;Bendixen et al., 2017b;Singh et al., 2020;Vergara et al., 2022) and are in accordance with state-of-the-art conceptual models, which expect that phases of glacier retreat (and re-advance) lead to the highest increase in sediment yield across glacial cycles (Antoniazza and Lane, 2021), as has been confirmed by several studies (Lane et al., 2017(Lane et al., , 2019;;Micheletti and Lane, 2016)..
The overall magnitudes of annual yields at two gauges fall at the high end compared to an extensive collection from the European Alps (Hinderer et al., 2013) and are in good accordance with yields from other catchments in the Stubai and Ötztal Alps (Schöber and Hofer, 2018;Tschada and Hofer, 1990) (see also Schmidt et al., 2022).
In our analysis of reconstructed annual sSSY at the two gauges, the two change point detection methods did not agree at gauge Vernagt: mcp yielded high change point probabilities around 1981, whereas the Pettitt's test detects a significant change point in 1989 and 2002.Similarly, results do not agree for July temperatures and discharge at gauge Vernagt.We attribute this to a limitation of Pettitt's test, which is known to be much less sensitive to break points located near the beginning or end of the time series (Mallakpour and Villarini, 2016).We proceed on the assumption that there is a high probability of break points around 1981 in th e reconstructed sSSY as well as discharge and July temperature time series at gauge Vernagt, as the mcp probability distributions are very narrow and similar to the corresponding probability distributions in Vent.
A question to be considered is whether the detected change points are credible or e.g.due to measurement errors.
Since they were detected at the two gauges independently, we deem it unlikely that e.g. a change in measurements caused the change points.Furthermore, a very similar step-like increase in measured SSC has been reported in the upper Rhône basin in Switzerland (Costa et al., 2018), only slightly later (around 1985), which is likely due to lower average temperatures in the Rhône basin as compared to the upper Ötztal.The authors attributed this shift to a step-like increase in mean annual temperature, which lead to enhanced ice melt, reduced snow cover and snowmelt contributions.They concluded that the onset of increased ice melt played a dominant role in the SSC increase, which matches the expectation that phases of glacier retreat (and re-advance) lead to the highest increase in sediment yield across glacial cycles (Antoniazza and Lane, 2021), which has been confirmed by several studies (Lane et al., 2017(Lane et al., , 2019;;Micheletti and Lane, 2016).(Li et al., 2020;Zhang et al., 2021) (Li et al., 2021) Our results suggest that in the Rofental, the step-like increase in sSSY reconstructed by our models is linked to the onset of increased ice melt as well, as we will discuss in section 6.3.
Interestingly, we found opposing trends in the time after 1981, with strongly increasing annual sSSY at gauge Vernagt and decreasing sSSY at gauge Vent.To some extent, this is mirrored toreflected in changes in discharge, where we found no significant trend (and Sen's slope close to zero) after 1981 at gauge Vent, but a strong positive trend at gauge Vernagt.Altogether, this could indicate different timings of "peak sediment" (Ballantyne, 2002;Antoniazza and Lane, 2021) and thus a stabilization or compensation of the larger Vent catchment as opposed to the nested Vernagt catchment.Some independent observations support this nothion: For example, the glacier tongue of the Hochjochferner (denoted as "HJF" in Fig. 1)a stabilization or compensation of the larger Vent catchment as opposed to the nested Vernagt catchmen t (Ballantyne, 2002;Antoniazza and Lane, 2021).This is not implausible: For example, the glacier tongue of the Hochjochferner (denoted as "HJF" in fig. 1 et al., 2007).We interpret this as a regime shift as summer mass balances continue to be lower than before 1981 although (July) temperatures decrease again, with the exception of 1984, which was characterized by a relatively high number snowfall days during the ablation period (Escher-Vetter and Siebers, 2007).
This shift in glacier melt is mirrored in discharge, which shows step-like increases around 1981 at both gauges and continues to be elevated after the change point.The analyses of monthly discharge showed that July was the only month with a change point around 1981 at both gauges, which again emphasizes the dominance of increased firn and ice melt.In contrast, we did not find evidence for a change in precipitation sums around 1981, which would indicate that enhanced hillslope erosion on snow-or ice-free surfaces played a crucial role in the sSSY increase.
With regard to suspended sediment dynamics, it is conceivable that the increase in ice melt translates to an increase in sediment-rich glacier meltwater (Delaney and Adhikari, 2020) as well as intensified fluvial erosion of sedimentrich proglacial areas.

Outlook
The presented study demonstrated that quantile regression forest is a suitable method for estimating past sediment export (Zhang et al., 2022), given the availability of sufficient SSC measurements for training and sufficiently long records of the predictors.Thus, it represents a promising tool with the ability to broaden our understanding of the response of high-alpine areas to climate change in the past decade.What is more, this study highlights the added value of change point identification in addition to trend analyses in the respective time series, which visualizes sudden changes in the analyzed systems and thus facilitates a better understanding of critical time periods.Future studies could help gain more knowledge on decadal-scale sediment export from high alpine areas by applying the presented approach to other catchments.In turn, advancing knowledge on past changes will support and prepare the development of management and adaptation strategies.

Conclusions
The lack of sufficiently long measurement records of suspended sediment concentrations (SSC) has resulted i n a gap of knowledge on decadal-scale sediment export and its response to past climatic changes as opposed to longer Further, we explored advantages and limitations of the QRF approach.As a major limitation, QRF may yield underestimates if predictors exceed the range of values represented in the training dataset.To estimate the effect on the model results, we suggest to assess the number such exceedances, which can also be leveraged to evaluate the representativity of the training dataset.Further, we found that events with very high SSC and Qsed tend to be underestimated.While this is not surprising given the rareness of such events, the effect of such underestimations on the annual yields was small.The assessment of the temporal extrapolation ability revealed a satisfactory performance of QRF and illustrated again the importance of a sufficiently varied dataset, i.e. including larger events.A comparison of QRF to sediment rating curves in a five-fold cross-validation showed that QRF was better able to model periods that contain very high SSC caused by mass movements.This points to the favorable ability of QRF to model threshold effects, which is a major advantage compared to approaches bound to continuous relationships.The final QRF reconstruction models showed good performance at both gauges and outperformed SRC by about 20 percent-points of explained variance.We conclude that the presented approach is a helpful tool for estimating past sediment dynamics in catchments where long enough SSC measurement series are lacking.The presented approach can provide a helpful tool for estimating past sediment dynamics in catchments where long enough SSC measurement series are lacking.This can aid in gaining more knowledge on sediment export on decadal scales, which to date is still limited, as well as prepare management and adaptation strategies to future changes.
Our validations showed that the model performs well in reproducing past sediment dynamics.Limitations include the underestimation of days with extremely high SSC, which are less frequently represented in the training dataset.
However, this does not seem to affect annual sediment yields to the same extent.As a more general limitation, QRF cannot extrapolate beyond the range of values represented in the training data, but the number of days where this becomes problematic is limited in our case, because of the large and sufficiently varied dataset.
We analyze the resulting annual specific suspended sediment yields (sSSY) with respect to trends and change points.At both gauges independently, annual sSSY show positive trends as well as step-like increases after 1981.
As this coincides with exceptionally high July temperatures in 1982 and 1983, distinct changes in the glacier mass balances of the two largest glaciers in the catchment area and a sudden increase in ablation area of one of the glaciers, we conclude that temperature-driven enhanced glacier melt is responsible for the increase in sSSY.This is also mirrored in discharge measurements at both gauges, which, show change points around 1981 as well.These findings also demonstrate the importance of assessing change points in addition to trend analyses, in order to detect shifts in geomorphic systems.The presented approach can provide a helpful tool for estimating past sediment dynamics in catchments where long enough SSC measurement series are lacking.This can aid in gaining more knowledge on sediment export on decadal scales, which to date is still limited, as well as prepare management and adaptation strategies to future changes.
Future studies could help gain more knowledge on decadal-scale sediment export from high alpine areas by applying the presented approach to other catchments.In turn, advancing knowledge on past changes will support and prepare the development of management and adaptation strategies.

2.
Validation A -temporal resolution: we ran models in both hourly and daily resolution on all available training data at gauge Vernagt to assess the error magnitude due to the coarser temporal resolution in the final reconstruction models.3. Validation Bextrapolation: we trained a model on the corrected Vernagt SSC data of 2019 and 2020 in daily resolution and estimated SSC back until 2000 for validation against the 2000-2001 measurement data (Fig. 2, center).
in the appendix) next to the gauge since 1974.To fill the present gaps, we used data recorded by the Hydrographic Service since 2010 in close proximity to the Vernagt station.As some gaps still remained, we subsequently used data recorded at Martin-Busch-Hütte (conversion factors: TempVernagtF = = -0.002536• TempMB²+ 0.9196*TempMB -0.474; PrecipVernagtF = 0.895 • PrecipMB).With this, 12 % of the precipitation time series and 9 % of the temperature time series were filled, although many of these filled gaps occur during the winter months, when the discontinued discharge data inhibit QRF modelling anyway (see Figfig. Turbidity was recorded in the summers of 2000 and 2001 (Staiger-Mohilo STAMOSENS 7745 UNIT) (Naeser, 2002) as well as 2019 and 2020 (Campbell OBS501).Water samples for calibration of turbidity to SSC were taken directly next to the turbidity sensor, by hand in 2000 and 2001 (57 samples), and by means of an automatic sampler (ISCO 6712) in 2019 and 2020 (131 samples).The latter initiated sampling if one of two criteria was met: (i ) regular sampling, to avoid long gaps between two samples, (ii) threshold-based sampling to obtain samples across the whole range of possible turbidity values.Gravimetric sediment concentrations SSCg were then determined in the laboratory and used to convert turbidity to SSC (2000-20012000/01: SSC [g/l] = 0.1583 * turbidity [V] -13.0877 (Naeser, 2002); 2019/20: SSC [g/l] = 0.00212*turbidity [FNU]).

Figure 3 :
Figure 3: a) Daily Qsed calculated from out-of-bag prediction of daily and hourly models vs. Qsed calculated from turbidity at gauge Vernagt; b) Comparison of mean annual sSSY estimates of the daily and hourly models at gauge Vernagt.NSE: Nash-Sutcliffe-efficiency; BE: benchmark efficiency.Dashed lines indicate ratio of 1:1.

Figure 4 :
Figure 4: a) Daily modelled Qsed estimates from the QRF validation models vs. Qsed from turbidity at gauge Vernagt versus Qsed from turbidity for the years 2000 and 2001 at gauge VF. b) Annual sSSY based on turbidity observations, estimates of the validation model (without 2000-20012000/01 data) and estimates of the reconstruction model (all training data) in 2000 (left) and 2001 (right).This comparison was not possible for 2019 and 2020, since observation data are missing in May -July 2019 and May -June 2020.Subdivision into seasons: spring (bottom, Apr -Jun), summer (middle, Jul -Sep), autumn (top, Oct -Dec).Boxes depict mean model estimates and whiskers depict 2.5 and 97.5 percentiles of model predictions.NSE: Nash-Sutcliffe-efficiency; BE: benchmark efficiency.
Nevertheless, with respect to annual estimates, the NSE of modelled sSSY vs. sSSY derived from turbidity of 0.83 show very good agreement (fig.5 b).

Figure 5 :
Figure 5: Measured versus modelled sediment concentration and yield using QRF (OOB-estimates, red) and SRC (blue).Top row: gauge Vent; bottom row: gauge Vernagt; left: QRF, daily values, centre: SRC, daily values, right: QRF and SRC, annual values.NSE: Nash-Sutcliffe-efficiency; BE: benchmark efficiency.a) Daily Qsed calculated from out-of-bag prediction of model vs. Qsed calculated from turbidity at gauge Vent; b) Comparison of mean modelled sSSY and measured annual sSSY at gauge Vent.NSE: Nash-Sutcliffe-efficiency; BE: benchmark efficiency.

Formatiert:
Einzug: Links: 0,63 cm, Keine Aufzählungen oder Nummerierungen shows a rather narrow probability density distribution around 1980/81 for all realizations with maximum probability density in 1981.At gauge Vernagt, MCP mcp shows very similar probability density distributions to that of Vent with maximum probabilities in 1981 for all realizations.However, Pettitt's test detects a change point in 1989 (p < 0.01; 66 % of realizations in 1989 and 27 % in 2002).As we elaborate in the discussion, this is likely due to a limitation of the Pettitt's test.Thus, we divided both time series in 1981 to examine the resulting segments for trends.In the first segment, no significant trends were detected at gauge Vernagt, and at gauge V ent, only two of the 250 realizations show significant positive trends (SS of 11.1 and 3.9 t/km²/a; see Figfig.5).In the second segment (i.e.after 1981), we detected a negative trend in mean annual sSSY at gauge Vent (SS = -7.6 t/km²/a), as well as in 42 realizations (SS of -13.1 to -5.9 t/km²/a).In contrast, at gauge Vernagt mean sSSY (SS = 23.5 t/km²/a) as well as 248 of the realizations (SS of 20.2 to 27.9 t/km²/a) show strong positive trends.The average sSSY (± 1SD) of all years after 1981 is 1579 (± 391) at gauge Vent and 1537 (± 603) at gauge Vernagt.

Figure 6 :
Figure 6: Mean specific annual suspended sediment yields (sSSY) as reconstructed by the QRF model (black points).Whiskers depict 2.5 % and 97.5 % quantiles of the 250 QRF realizations.Overall trend of mean annual sSSY is given by Sen's Slope (black line) and trends of all 250 QRF realizations in grey.Change points determined by Pettitt's test (red dashed line) and MCP mcp (posterior distributions of 250 realizations as solid light red and mean as dark red lines).Trends in the time series segments before and after 1981 are given as dark blue (mean) and lightblue (realizations) lines (only plotted if significant).

Figfig.
Figfig.A1 in the appendix).Both timeseries show significant positive trends but no clear change points, as Pettitt and MCPmcp disagree and MCP mcp yields very widespread probability distribution (Figfig.7 ea) and fb).Additionally, we analyzed mean monthly temperatures over time and found that while most months show positive trends, at both gauges July is the only month with a high change point probability around 1981 (Figfig.7 ea) and fb).At both stations, July temperatures in 1982 and 1983 were exceptionally warm.Annual specific discharge sums at both gauges show significant positive overall trends (fig.7 c) and d).In Vent, both change point detection methods indicate high probabilities of a change point in 1981.At gauge VF, the two methods disagree as in the case of annual sSSY, as the Pettitt test detects a change point in 1990 while mcp suggests a change point with high probability in 1981.Dividing both time series in the year 1981, both segments of the Vent time series show insignificant trends, while both segments at gauge VF show significant positive trends (1 st segment: Sen's slope = 32.7 mm/a, p < 0.01; 2 nd segment: Sen'S slope = 17.2 mm/a, p < 0.001).

Figure 7 :
Figure 7: a) and b) annual discharge yields at gauges Vent and Vernagt; c) Annual and summer (May -Sept) precipitation in Vent; d) Summer (May -September) precipitation at gauge Vernagt; e) Mean annual and July temperatures in Vent; f) Mean summer (May -Sept) and July temperatures at gauge Vernagt; g) Annual mass balances of the Hintereisferner (HEF); h) annual, winter and summer mass balances of Vernagtferner (VF)Mean annual and July temperatures in Vent; b) Mean summer (May -Sept) and July temperatures at gauge Vernagt; c) and d) annual discharge yields at gauges Vent and Vernagt; e) Annual mass balances of the Hintereisferner (HEF); f) annual, winter and summer mass balances of Vernagtferner (VF).; g) Annual and summer (May -Sept) precipitation in Vent; h) Summer (May -September) precipitation at gauge Vernagt.Dashed vertical lines show change point locations according to Pettitt's test and solid lines show Sen's Slope.Both are only drawn if they are significant at α = 0.01.Lines at the bottom show posterior density distributions of MCPmcp change point locations, with colors corresponding to the respective variables, if several variables are depicted within one plot.
assessed temporal extrapolation ability at gauge Vernagt by The second validation (Validation B, where we traininged a model on the data of 2019/20 and comparinged the estimates to measurements in 2000 and 2001 (Validation B; Figfig.4), which showed over-and underestimation of annual sSSY by 31 % and 16 %, respectively.
of dynamics on short timescales as well as annual estimates.The validation model trained on the data of 2000/2001 und LuftbildatlasTirol, 2022) and the longitudinal profile of the major water courses are very steep, which precludes significant sediment storage.
annual temperature in Vent and mean summer temperature at gauge Vernagt at both locations show gradual positive trends without a clear change point.Yet, mean July temperatures show high change point probabilities around 1981 at both locations, which is probably heavily influenced by extraordinarily high temperatures in July

(
geological) and shorter timescales.Here, we present an attempt to compensate for this lack of measurement data by estimating past suspended sediment concentrations (SSC) from recent SSC measurements and long time series of the most important hydro-climatic variables (discharge, precipitation and air temperature) by means of a quantile regression forest model.In the presented study, we tested a Quantile Regression Forest (QRF) model, which enabled the estimation of sediment export of the past five decades for the two gauges Vent and Vernagt.This allowed to analyze annual specific suspended sediment yields (sSSY) for trends and change points.Annual sSSY show positive trends as well as step-like increases after 1981 at both gauges.As this coincides with exceptionally high July temperatures in 1982 and 1983, distinct changes in the glacier mass balances of the two lar gest glaciers in the catchment area and a sudden increase in ablation area of one of the glaciers, we conclude that temperaturedriven enhanced glacier melt is responsible for the step-like increase in sSSY.This is also mirrored in discharge measurements at both gauges, which show change points around 1981 as well.Opposing trends after 1981 at the two gauges could indicate different timings of 'peak sediment'.These analyses also demonstrated the value of assessing change points in addition to trend analyses, in order to detect sudden changes in the analyzed geomorphic systems and thus facilitate a better understanding of critical time periods.
data, adapted and extended the model code and performed modelling experiments with support and supervision of TF and LKS.LKS conducted the statistical analyses with support by TF.CM reviewed and evaluated the results in the glaciological context.LKS prepared the original draft including all figures, and all authors contributed to writing of this paper.

Figure
Figure A1 Discharge (Q), Temperature (T) and Precipitation (P) data at the two gauges after gap -filling procedure.1445

Figure
Figure A2 Trends and change points in monthly discharge sums at gauge Vent.Dashed red lines indicate change points 1450 Yet, turbidity-derived SSC data have only been recorded for the years2000, 2001, 2019and 2020 (see Fig.2, upper panel "Vernagt", left side, plots labelled "SSC").Additionally, the data in 2019 and 2020 are affected by episodic siltation and periods when the turbidity sensor reached saturation: 0.47 % of the two summers of data were affected by saturation and about 8 % were affected by siltation.The latter was mainly due to one period of about 16 days in August 2019.Additionally, there were three shorter incidents (1.5 hours to 1.5 days), two in 2019 and one in 2020.These issues first need to be dealt with in order to provide accurate training data for our model, as it is sensitive to the range of values represented in the training data (Escher-Vetter, 2007;ebers, 2007)07)utary valley to the Rofental, has retreated by about 2 km since the 1970s (determined based on historic aerial image collection of the State of Tyrol (Laser-und Luftbildatlas Tirol, 2022)) and has now retreated behind a rock sill.Additionally, several small lakes have formed in the its glacier foreland that likely act as sediment sinks.Beyond that, aAnother two smaller glaciers in the North-Eastern part of the Vent catchment, Mean annual temperature in Vent and mean summer temperature at gauge Vernagt at both locations show gradual positive trends but we do not find evidence for a clear change point.However, mean July temperatures show high change point probabilities around 1981 at both locations, which is probably heavily influenced by the extraordinarily high temperatures in July 1982 and 1983.High temperatures in July are especially relevant for firn and glacier melt, since July is the month with the highest firn and ice melt contribution to discharge, after snow melt contributions have peaked in June and snow cover has decreased substantially .This is also reflected in the mass balance time series of the Hintereisferner and Vernagtferner, which are almost exclusively negative after 1980, with very negative summer mass balances in1982 and 1983.Escher -Vetter (2007)andAbermann et al. (2009)have also shown that mass loss started in 1981 at both glaciers, resulting from negative summer balances.With respect to the Vernagtferner, the drastically higher ablation area ratio of almost 80 % in 1982 as compared to around 25 % in the preceding years(Escher-Vetter and Siebers, 2007)indicates that large areas of the glacier became snow-free in 1982.This entails crucial changes in albedo and therefore intensified ice melt and thinning of firn areas due to rising energy absorption at the glacier surface(Escher-Vetter, 2007; Braun Mitterkarferner and Platteiferner ("MKF" and "PF" in Figfig.1),have disappeared almost completely since the 1970s.Conversely, the Vernagtferner glacier has also experienced considerable loss in area and volume, but lacks such pronounced qualitative changes thus far.2.11.Part III: Analysis of predictors