The thermal peak: A simple stream temperature metric at regional scale

Spatiotemporally comprehensive stream temperature datasets are rare because interest in these data is relatively recent and there is little money to support instrumentation at regional or national scales. This lack of data has been recognized as a major limitation for understanding thermal regimes of riverine ecosystems. To overcome these barriers, we first aggregated one of the largest stream temperature databases on record with data 10 from 1700 individual stations over nine years from 2009–2017 (n=45,000,000 hourly measurements) across France (area = 552,000 km). For each station, we calculated a simple, ecologically relevant metric–the thermal peak–that captures the magnitude of summer thermal extremes. We then used three statistical models to extrapolate the thermal peak to nearly every stream reach in France and Corsica (n=105,800) and compared relative model performances among each other and with an air temperature proxy. In general, the hottest thermal peaks were 15 found along major rivers, whereas the coldest thermal peaks were found along small rivers with forested riparian zones, strong groundwater inputs, and which were located in mountainous regions. Several key predictors of the thermal peak emerged, including drainage area, mean summer air temperature, minimum monthly specific discharge, and vegetation cover in the riparian zone. Despite differing predictor importance across model structures, we observed strong concordance among models in their spatial distributions of the thermal peak, 20 suggesting its robustness as a useful metric at the regional scale. However, air temperature was a poor proxy for the stream temperature thermal peak across nearly all stations and reaches, highlighting the growing need to measure and account for stream temperature in regional ecological studies.

without seasonal dynamics were previously excluded from this data set. All data were recorded by automatic data loggers managed by professional biologists or hydrologists. Outliers from each station's time series were removed with automatic outlier detection filters and the resulting data were screened visually before being averaged into mean daily stream temperature data (hereafter referred to as Tw). The automatic outlier detection consists of three steps with eight unique filters that remove, in order, 1) hourly Tw anomalies, 2) monthly anomalies between Tw 100 and Tair, and 3) daily anomalies between Tw and Tair (Moatar et al., 2001;Beaufort et al., 2020b).
To address ecologically meaningful temperature metrics under climate change, we focused on the two hottest stream temperature months, July and August (hereafter referred to as summer), a time period essential for the growth and survival of many aquatic species. This focus also has the benefit of maximizing the number of  year (x-axis) and station (y-axis) with the total stations per year listed at the top of each column. Sites are colored 115 by the number of years with observations. c) Distributions of drainage area for RHT reaches (blue) and of thermal stations (black).

Defining the thermal peak stream temperature metric
Due to the limited concordance among stream temperature time series (Figure 1b) and to focus our analysis towards ecologically relevant ends, we summarized our stream temperature data with a simple metric. This kind of metric 120 has precedent in regional species distribution models that instead used air temperature (hereafter referred to as Tair) as a proxy (Buisson and Grenouillet, 2009).
We refer to this metric as the thermal peak (Tp), and define it as the interannual average of the mean temperature of the 30 hottest consecutive days of each year , : Where i = year index; and N= the number of years of observation available from 2009-2017 (N = 1-9). Across sites, the hottest day of the year always occurred within the approximate 30 day period between July 28 and August 30 (mean±sd: August 12±16 days), lending support to this focused approach.

Climate correction of the thermal peak
The thermal peak can be biased depending on the climatic variability of the years of observation for each station. 130 Indeed, only 30 of our stations have Tp calculated using all nine years of data, and therefore have the highest level of confidence in their estimate. We refer to the Tp from these 30 stations as Tp,ref, indicating that these are reference, https://doi.org/10.5194/hess-2021-218 Preprint. Discussion started: 5 May 2021 c Author(s) 2021. CC BY 4.0 License. or true estimates of Tp (Table 1). To account for the bias associated with missing data at the remaining 1670 stations, we gap-filled missing data at these stations using site-specific stream-air temperature regressions. This method accounts for interannual variation in climatic forcing on stream temperature, and we therefore refer to it 135 as a climate correction.
The climate correction is achieved by first calculating station-specific regressions between daily Tw and a rightaligned moving average of daily Tair at lags ranging from 2-10 days. The moving average lag whose regression produced the highest coefficient of determination was then used to fill gaps in the time series of Tw for each station.
Next, we reconstructed summer stream temperature using this regression and subsequently recalculated Tp based 140 on this reconstructed data. We refer to Tp from these climate-corrected data as Tp,clim to indicate that missing data were gap-filled with the climate correction procedure (Table 1).
To validate this approach, we conducted a permutation test on the 30 stations with full annual monitoring from 2009-2017 (i.e., sites where a Tp,ref is known). At each site, we introduced randomly placed, artificial annual gaps into observed data ranging from 1-9 years to simulate missing data. We then backfilled these introduced gaps 145 according to the climate correction method. Following gap-filling, we calculated two metrics: 1) Tp using gapinduced data without gap-filling (Tp,gap), and 2) the thermal peak using the gap-filled data (Tp,fill; Table 1). We then compare these two metrics to the reference thermal peak (Tp,ref) using absolute biases at each tested permutation (i.e., the number of introduced gap years). This approach allowed us to assess whether the climate-corrected reconstruction of the gaps in time series is 1) a useful approach, and 2) lower in bias and uncertainty compared to 150 using observed data alone. Tp,clim thermal peak for stations with less than nine years of data to which climate correction was applied 1670 Tp,gap thermal peak for reference stations with introduced data gaps 30 Tp,fill thermal peak for reference stations whose introduced data gaps were filled with climate correction 30 Tp,m modeled thermal peak using statistical extrapolation to the RHT network using statistical method m 114,600 Tp,air thermal peak estimated using the air temperature proxy 114,600

Extrapolating the thermal peak to national scale with statistical modeling
We estimated Tp throughout the entire RHT network using four distinct statistical models. For modeling (Tp,m) it is not clear how to choose a priori a particular model structure due to the complexity of the processes involved in 155 determining local stream temperature. Therefore, we tested four different structures: 1) a multiple linear regression model (REG), 2) an artificial neural network model (ANN) that is potentially non-linear, but encompasses a linear model as a special case, 3) a random forest model (RF) that can also be non-linear, but with a different strategy than ANN, and 4) a multi-model combination (MM), which combines the three prior structures ANN, RF, and REG. All these models are based on a function to estimate Tp at all stations (i): 160 To measure how stream thermal regime estimations might be different if using a Tair proxy, we also calculated Tp,air using Tair from SAFRAN data. The Tp,air is calculated identically to Tp, but using daily Tair instead of daily Tw, as would be done in ecological studies using Tair as a proxy for Tw. Finally, to determine the effect of climate 165 corrections on the full RHT extrapolation, we compared the distribution of Tp,m values when models were fit using either Tp,obs or Tp,clim.

Explanatory variables
We selected sixteen variables (Table 2)  The eight variables relating to catchment characteristics were extracted from either the SYRAH-CE database (Valette et al., 2012) or the RHT database (Pella et al., 2012). The three variables from RHT comprise: 1) mean (6) Where WV (-) = the relevance of covariate V, AV,h (-) = the ANN coefficients connecting hidden unit h to covariate 210 V, Bh (-) = the ANN coefficients connecting hidden unit h to the output, and nhu = the number of hidden units.

Random forest
We used a random forest for the third statistical model structure, using Breiman's algorithm (Breiman, 2001) with the R package randomForest (Liaw and Wiener, 2002) allowing 500 trees. The relevance of each predictor variable was provided as standard output by the randomForest package, which determines how much the mean square 215 errors in prediction increases when that covariate is randomly permuted within the tree.

Model cross-validation
To assess model performance prior to spatial extrapolation to the entire RHT network, we conducted a cross-225 validation on observed data from our 1700 stations. For each model, including Tp,air, we used 80% of stations (n = 1360) as training data to estimate model parameters. Using those model parameters, we estimated validation data Tp at the remaining 20% of stations (n=340) and cross-validated those estimates with Tp,obs. We conducted this cross-validation this 100 times allowing for random selection of stations used in the training and validation data sets. We evaluated the results of the cross-validation with the Nash-Sutclife efficiency criterion (NSE, Nash and 230 Sutcliffe, 1970), the RMSE (Root Mean Square Error) and biases between observed Tp,obs and Tp,m.

Validation of the climate correction approach
Gap-filling Tw with our climate correction approach resulted in consistently lower absolute biases for Tp (i.e., Tp,fill) compared to only using available data (i.e., Tp,gap), regardless of the length of introduced annual gaps (Figure 2). 235 This bias was less than 0.5°C for more than 75% of stations with only one year of observation. In contrast, when there is only one year available, the biases for data with artificial gaps, Tp,gap, are higher than 0.5 °C for 75% of the stations. There was no systematic bias regardless of the years taken into account in the regressions. annual gaps were filled with the climate correction procedure (|Tp,fill-Tp,ref|, red boxplots) compared to when gaps were unfilled (|Tp,gap-Tp,ref|, blue boxplots) regardless of the number of introduced gaps.

Model cross-validation
In cross validation, all four statistical models performed substantially better than air temperature at accurately predicting Tp (Figure 3) negative NSE for Tp,air indicates that using a simple mean of Tw observations is a better predictor of Tp than using Tair (Figure 3b). Overall, models tend to slightly underestimate Tp (Figure 3a), but mean biases are close to 0°C.
The REG and ANN models had similar performance (median RMSE > 1.5°C; median NSE = 0.6), whereas the RF and MM models obtained the best performances (median RMSE < 1.5°C; median NSE = 0.7) with the MM slightly superior for NSE (Figure 3b,c). We did not observe any spatial patterning in performance metrics (not 250 shown).

Relevance of explanatory variables in models
Explanatory power varied among modeling approaches, and the maximum variance explained for any one variable was between 25-30%. (Figure 4) The two most relevant variables were catchment area (area) and mean summer 260 air temperature (Tsummer) for the RF and multiple regression models (Figure 4b,c), but were minimum monthly specific discharge (qmin) and riparian vegetation cover (veg) for the ANN model (Figure 4a; qmin is not correlated with area, R² = 0). Surprisingly, area and Tsummer obtained relative importances of less than 5% in ANN whereas they were the most influential variables in the RF and the REG models. It should be noted that the cumulative importance of the four most relevant variables of the REG and ANN models is respectively 92% and 81%, which 265 https://doi.org/10.5194/hess-2021-218 Preprint. Discussion started: 5 May 2021 c Author(s) 2021. CC BY 4.0 License. means that the other variables have very little weight in the estimates. This sum is only 69% for RF, which indicates that the relative importance of the explanatory variables are more distributed and the other explanatory variables have a significant weight in the estimates which could explain the best performances in cross validation of RF.

Spatial extrapolation of thermal peaks
The statistical models could extrapolate Tp to 92% (105,800 reaches) of the RHT network, and the resulting spatial 275 structure of the extrapolations was consistent across models ( Figure 5). On average, Tp was 18.2°C and ranged between 6.3°C and 27.0°C. The highest Tp,m (i.e., Tp > 22°C) were generally found on the largest rivers located in 14°C on more streams (> 15%) compared to other models (< 10%; Figure 6a). In stark contrast, estimating Tp with air temperature (i.e., Tp,air) led to consistently higher values than were obtained with statistical models, with Tp,air greater than 20°C for more than 70% of the reaches (Figure 5a, Figure 6a).
Application of the climate correction prior to model fitting and subsequent extrapolation showed that differences at the RHT are less than 0.5°C at 89% of the reaches (Figure 6b). Still, 10% of the reaches exhibited Tp,m differences 285 greater than 0.5°C, and the vast majority of these differences are negative (9.4% vs. 1.6%), suggesting that climate correction more often than not reduces overestimates in Tp.

Discussion 300
We compiled one of the largest regional stream temperature datasets to address the growing need to understand stream thermal regimes in the context of climate change. We demonstrate that a simple, ecologically meaningful metric, which we term the thermal peak (Tp), can be reliably estimated at the regional scale using a few easily accessible explanatory variables.

Horizons and limitations in estimating large-scale stream thermal metrics 305
Spatiotemporally comprehensive stream temperature datasets are rare because interest in these data is relatively recent and there is little money to support instrumentation at regional or national scales. This lack of data has been recognized as a major limitation for understanding thermal regimes of riverine ecosystems (Arismendi et al., 2012;Ouellet et al., 2020). Existing data typically come from different entities and are not managed according to a predefined regional strategy, precluding broad-scale synthesis and understanding of controls on stream 310 temperature and its subsequent effects on ecosystems and society. To overcome these barriers, we employed a combined empirical approach that allowed us to identify, at the regional scale, a map of summer stream temperature maxima with important implications for aquatic species distributions under climate change. We observed the hottest Tp along major rivers and the coldest Tp along small rivers and in mountainous regions ( Figure   5). The downside of the current approach is that it remains based on interannual metrics. Indeed, the non-315 concomitance of the time series does not allow us to compare extreme years (hot vs. cold).
To enable unbiased comparison among stations from time series with gaps, we used a climate correction of Tw based on regression with Tair. The efficacy of regressions between Tw and Tair is well-understood (Ducharne, 2008;Segura et al., 2015, Moatar andGailhard, 2006), and the approach can be easily transferred to other stations and regions. On the other hand, a cautious approach is required for such regressions, because they assume seasonal 320 correlation Tair and Tw, and can therefore only apply to rivers having natural seasonal dynamics, without dam release, thermal peaking, or major weirs regulating the flow of streams (Bruno et al., 2013;Chandesris et al., 2019;Seyedhashemi et al., 2021;Casado et al., 2013). However, where possible, climate correction makes it possible to significantly reduce the biases in the estimates of Tp to metrics based only on observations (Figure 3).
The biases are particularly reduced when the number of years of observation available is less than four. Beyond 325 this limit, the meteorological variability specific to each year is sufficient to estimate a robust interannual metric Tp, which confirms results obtained by (Jones and Schmidt, 2018). Moreover, climate correction reduces overestimation of Tp, when applied at scale (Figure 6b), reinforcing the importance of taking into account these climatic corrections to temperature metrics even if it only slightly affects the majority of rivers (Isaak et al., 2017) 4.2 Spatial extrapolation of Tp is consistent and best predicted with a random forest model 330 All four statistical models achieved similar predictive performance, but the RF model exhibited marginal improvements over the others (Figure 3). The MM approach only slightly improves performance in crossvalidation in comparison with RF (vis-à-vis the NSE criterion), so it is unlikely to be a useful approach in future applications due to its complexity. Our model performances (NSE = 0.75, RMSE <2) were of the same order as https://doi.org/10.5194/hess-2021-218 Preprint. Discussion started: 5 May 2021 c Author(s) 2021. CC BY 4.0 License.