Diagnostic evaluation of river discharge into the Arctic Ocean and its impact on oceanic volume transports

This study analyses river discharge into the Arctic Ocean using state-of-the-art reanalyses such as the fifth-generation European Reanalysis (ERA5) and the reanalysis component from the Global Flood Awareness System (GloFAS). GloFAS, in its operational version 2.1, combines the land surface model (Hydrology Tiled ECMWF Scheme for Surface Exchanges over Land, HTESSEL) from ECMWF’s ERA5 with a hydrological and channel routing model (LISFLOOD). Further, we analyse GloFAS’ most recent version 3.1, which is not coupled to HTESSEL but uses the full configuration of LISFLOOD. 5 Seasonal cycles, as well as annual runoff trends are analysed for the major Arctic watersheds Yenisei, Ob, Lena and Mackenzie where reanalysis-based runoff can be compared to available observed river discharge records. Further, we calculate river discharge over the whole Pan-Arctic region and, by combination with atmospheric inputs, storage changes from the Gravity Recovery and Climate Experiment (GRACE) and oceanic volume transports from ocean reanalyses and assess closure of the non-steric water volume budget. Finally, we provide best estimates for every budget equation term using a variational adjust10 ment scheme. Runoff from ERA5 and GloFAS v2.1 feature pronounced declining trends, induced by two temporal inhomogeneities in ERA5’s data assimilation system, and seasonal river discharge peaks are underestimated by up to 50% compared to observations. The new GloFAS v3.1 product exhibits distinct improvements and performs best in terms of seasonality and long term means, however opposing to gauge observations it also features declining runoff trends. Calculating runoff indirectly through 15 the divergence of moisture flux is the only reanalysis-based estimate that is able to reproduce the river discharge increases measured by gauge observations (Pan-Arctic increase of 2% per decade). In addition we examine Greenlandic discharge, which contributes about 10% of the total Pan-Arctic discharge and features strong increases mainly due to glacial melting. The variational adjustment yields reliable estimates of the volume budget terms on an annual scale, requiring only moderate adjustments of less than 3% for each individual term. Approximately 6583±84 km freshwater leave the Arctic Ocean per year 20 through its boundaries. About two thirds of this are contributed by runoff from the surrounding land areas to the Arctic Ocean (4379±25 km per year) and about one third is supplied by the atmosphere. However, on a seasonal scale budget residuals of some calendar months were too large to be eliminated within the a priori spreads of the individual terms. This suggests that systematical errors are present in the reanalyses and ocean reanalyses data-sets, which are not considered in our uncertainty


Introduction
Rapid surface warming in the Arctic region has strong impacts on the Arctic water balance and its individual hydrological components, almost certainly leading to an amplification in runoff, evapotranspiration and precipitation (Rawlins et al., 2010;Collins et al., 2014). Increasing river discharge and precipitation trends and intensified sea ice melt coupled with an increase 30 of freshwater inflow through Bering Strait lead to an increase of liquid freshwater stored in the Arctic Ocean (Morison et al., 2012;Haine et al., 2015;Haine, 2020). Ultimately, this could result in enhanced southward exports of low-density waters (Lin et al., 2021) into the Atlantic Ocean, impacting the oceanic circulation also on a global scale. Altogether the hydrological cycle is a complex process with tight coupling between the individual components, having impacts on energy and mass budgets and eventually sea level rise, both regionally (Proshutinsky et al., 2001;Moon et al., 2018) and globally (e.g., Box et al., 2018). 35 Therefore, the quantification of the individual hydrological components and their changes is of great importance.
With the Arctic Ocean being almost entirely surrounded by landmasses and some of the world's largest rivers draining into it, the link between ocean and surrounding land is strong. Hence, runoff forms one of the key variables in the Arctic freshwater budget. However, direct quantification of river discharge into the Arctic Ocean is aggrevated by the fact that about 30-40% of the Pan-Arctic drainage area is now unmonitored (Shiklomanov et al., 2002). Hydrological monitoring suffered a widespread 40 decline from 74% in 1986 to 67% by 1999 -in Siberia even 73% of river gauges were closed between 1986 and 1999 (Shiklomanov et al., 2002;Shiklomanov and Vuglinsky, 2008). In addition, significant portions of the rivers' discharge may bypass the gauging stations through braided channels or as submarine groundwater. Further, climatological conditions pose a hindrance to gauge measurements, as temperatures in the northern latitudes often lead to river freeze up in late autumn and flooding in spring due to river-ice break up (Syed et al., 2007). 45 Atmospheric reanalyses produce gridded estimates of atmospheric and land components, providing spatially continuous estimates of variables such as runoff. They represent highly useful tools for climate monitoring. In this study we evaluate runoff from state-of-the art reanalyses and GloFAS to provide a best estimate of Pan-Arctic river discharge and to incorporate it into the Arctics freshwater and volume budget. However, data assimilation systems can introduce biases and temporal discontinuities, as changes in the observing system are sometimes inevitable and may lead to inhomogeneities in the time series. One 50 known change is the introduction of the IMS (Interactive Multisensor Snow and Ice Mapping System) snow product in ERA5, which led to a negative shift in ERA5's snowmelt and consequently also runoff (Hersbach et al., 2020;Zsótér et al., 2020).
The first complete freshwater budget for the Arctic Ocean is proposed by Aagaard and Carmack (1989) and updated by Serreze et al. (2006) and Dickson et al. (2007). Since then, the amount of available data, in particular of atmospheric and oceanic reanalyses, opened new possibilities for evaluation of the coupled oceanic and atmospheric energy and hydrological cycles. For 55 example, Mayer et al. (2019) have presented a substantially improved depiction of the Arctic energy cycle compared to earlier assessments. With regard to freshwater or the water volume budget, the data situation has improved as well. New collections of hydrological data of the far north have been published (Shiklomanov et al., 2021b). Tsubouchi et al. (2012Tsubouchi et al. ( , 2018 presented observation-based estimates of volume fluxes through Arctic gateways. This opens the opportunity to overspecify the Arctic volume budgets and thus also to give residual and bias estimates. 60 This paper is structured as follows. The next section describes the used data and presents the study domain, followed by the methodology. The results are presented in Sect. 4 and are subdivided into seasonal cycles and trends for the four major Arctic watersheds (Sect. 4.1), Pan-Arctic seasonalities and trends (Sect. 4.2) and an assessment of budget closure by comparison with oceanic volume fluxes (Sect. 4.3). Section 5 presents conclusions and in the appendix a list of acronyms used throughout the text can be found. in this study comes from Roshydromet (Ob, Yenisei and Lena) and from the Water Survey of Canada (Mackenzie) and was downloaded through the Arctic Great Rivers Observatory (Shiklomanov et al., 2021b). Table 1 shows coordinates of gauge observations and GloFAS sampling locations for Yenisei, Ob, Lena and Mackenzie. For the Pan-Arctic approach, river discharge from additional 20 rivers was taken. Gauging records for Kolyma, Severna Dvina, Pechora and Yukon are available for our period of interest 1979-2019 and are also downloaded through the Arctic Great Rivers Observatory (Shiklomanov et al., 2021b), 95 while records for 16 further rivers (Pur, Taz, Khatanga, Anabar, Olenek, Yana, Indigirka, Alazeya, Anadyr, Kobuk, Hayes, Tana, Tuloma, Ponoy, Onega, Mezen) are taken from the Regional Arctic Hydrographic Network data set (R-ArcticNET, Lammers et al., 2001) for the period 1979-1999. We calculated an observation-based Pan-Arctic river discharge for the whole period of 1979 to 2019, by calculating discharge separately for every time step (= every month), using all river discharge measurements available at those time steps. The total Pan-Arctic discharge is then obtained by calculating river discharge for the ungauged 100 area at each individual timestamp (using two different calculation methods -see section 4.2) and adding it to the observed discharge.
Atmospheric components like precipitation, evaporation, atmospheric storage change and the divergence of moisture flux (VI-WVD) are taken from ERA5 and in Sect. 4.3 we additionally use VIWVD data from the Japanese 55-year Reanalysis JRA55 (Kobayashi et al., 2015) and JRA55-C (Kobayashi et al., 2014), which only assimilates conventional observation data. Land 105 storage is derived from snow depth (given as water equivalent) and soil water changes from ERA5. Groundwater storage is not represented in ERA5 and ERA5-Land and also the representation of frozen land components is not ideal in HTESSEL, as glaciers are depicted as large amounts of snow which are kept fixed to 10 m of snow water equivalent. When melting conditions are reached, the snow produces a water influx to the soil and consequently contributes to the total runoff. However, the mass balance is not accounted for over glaciers as the snow is restocked to constantly stay at the fixed 10 m level and hence changes 110 in the glacial storage component cannot be assessed properly. The soil water content includes liquid as well as frozen components and thus also includes permafrost. When the soil temperature reaches melting conditions, the soil water contributes to sub-surface runoff and the soil water storage declines. However, a recent study by Cao et al. (2020) concluded that ERA5-Land soil data are not optimal for permafrost research, due to a warm bias in soil temperature that leads to an overestimation of the active-layer thickness and an underestimation of the near-surface permafrost area. Therefore, we additionally include satellite 115 data from GRACE (Gravity Recovery and Climate Experiment, Tapley et al. (2004)) and GRACE Follow-On (Landerer et al., 2020), as land water storage from GRACE includes changes in soil moisture (including permafrost), glaciers, snow, surface   Ocean (GLORYS, Garric et al., 2017)) and the ECMWF (ORAS5, Zuo et al., 2018Zuo et al., , 2019. While the GREP ensemble members use the same ocean modeling core and atmospheric forcing (ERA-Interim, Dee et al., 2011), there are differences in used observational data and data assimilation techniques, as well as in the reanalysis initial states, NEMO (Nucleus for European Modelling of the Ocean) versions, the sea-ice models, physical and numerical parametrizations and air-sea flux formulations.
The data assimilation methods differ in many points, including the deployed assimilation schemes which range from 3DVAR 135 (three-dimensional variational data assimilation) to SEEK (Singular Evolutive Extended Kalman Filter). Furthermore, there are differences in the input observational dataset, in surface nudging, in the time-windows for assimilation and analysis as well as in the applied bias correction schemes. All those differences lead to an important dispersion between the reanalysis implementations and add up to the ensemble spread (Storto et al., 2019). For further details we refer to Storto et al. (2019) as well as the individual data documentations. In addition, volumetric fluxes are derived from moorings within the so-called 140 ARCGATE project , covering the period from October 2004 to May 2010. These observation-based estimates of volume fluxes come from a mass-consistent framework that views the Arctic Ocean as a closed box surrounded by landmasses and hydrographic observation lines located in the four major Arctic gateways. The hydrographic lines consist of arrays of moored instruments measuring variables temperature, salinity and velocity, making it possible to calculate fluxes of volume, heat and freshwater. For more details about the framework see Tsubouchi et al. (2012Tsubouchi et al. ( , 2018  km 2 ) and the land area draining into it (purple shading; corresponds to 18.2 × 10 6 km 2 for mainlands and islands and additional 0.95 × 10 6 km 2 for Greenland). The colour grading of the land areas indicates the four largest river catchments (Ob, Yenisei, Lena and Mackenzie; darkpurple), the additional 20 catchments with observations and the smaller catchments and coastal areas where no observations were available (light-purple).
Furthermore, there are two small passages, Fury and Hecla Strait, that connect the Arctic Ocean with Hudson Bay.
The terrestrial domain consists of land areas draining into the Arctic Ocean, including the Canadian Arctic Archipelago (CAA) as well as islands along the Eurasian coast. At the Pacific passage Yukon and Anadyr rivers are considered, as they repre-155 sent important sources for inflow of low salinity waters into the Arctic Ocean through Bering Strait. The total oceanic and terrestrial areas correspond to 11.3 × 10 6 km 2 and 18.2 × 10 6 km 2 respectively. For volume budget analyses we further incorporate Greenlandic discharge and storage change north of Davis and Fram Strait, which adds an additional terrestrial catchment area of 0.95 × 10 6 km 2 . The outlines for the individual river catchments were taken from the CEO Water Mandate Interactive Database of the World's River Basins (http://riverbasins.wateractionhub.org/) and regional outlines like e.g., the Canadian 3 Methods

Budget equations
A common way to calculate the oceanic freshwater budget is through the assumption of a reference salinity (e.g., Serreze et al., 2006;Dickson et al., 2007;Haine et al., 2015). However, the outcome is dependent of those reference salinities in a nonlinear 165 way, so that slight differences in the choice of the reference value lead to very different estimates of freshwater transports, both temporally and spatially. Hence, Schauer and Losch (2019) declared freshwater fractions not useful for the analysis of oceanic regions and rather recommend the usage of salt budgets for salinity assessments. In this paper we do not calculate salt budgets, but we estimate volume budgets, and hence also avoid the usage of a reference salinity. Hereinafter, the volumetric budget equations for atmosphere, land and ocean are formulated.

Atmosphere
The change of water storage in the atmosphere -here expressed as water vapor integrated from the earth's surface to the top of the atmosphere (i.e. total column water vapor; hereafter denoted S A ) -denotes the left side of the volumetric budget equation. Atmospheric liquid water and ice are neglected, as they represent only a very small fraction of water in comparison to atmospheric water vapor and lateral moisture fluxes. Generally atmospheric water in liquid and solid phase are only present 175 in significant amounts in regions with high tropical cumulus clouds and over warm ocean currents (Serreze and Barry, 2014).
The atmospheric storage change (S A ) is balanced by the surface freshwater fluxes evapotranspiration ET and precipitation P (both in SI units ms −1 ) and the vertically integrated horizontal moisture flux divergence (last term; hereafter denoted as VIWVD): with the gravitational constant g, surface pressure p s , specific humidity q, density of freshwater ρ w and the horizontal wind vector v. The equation above is probably more familiar to the reader as a mass budget equation, without ρ w in the denominator.
In order to get volumetric water fluxes, we divided by the density of freshwater ρ w =1000 kg m −3 , which is assumed constant in this paper, neglecting dependence on temperature and soluble substances. Further, all terms are integrated over the total Arctic area A total (including the Arctic Ocean and all terrestrial catchments) to obtain SI units of m 3 s −1 . For presentation of results, 185 we will often use Sverdrup Sv (=1e6 m 3 s −1 ), milli-Sverdrup mSv (=1e3 m 3 s −1 ) or km 3 yr −1 as more convenient units.

Land
The change in land water storage (S L ) can be expressed as sum of changes of volumetric soil water SW V L n integrated over the corresponding soil depth l n , changes in snow depth SD -given as snow water equivalent -and changes in groundwater storage GS. Changes in land water storage are balanced through precipitation P L and evapotranspiration ET L over land and 190 runoff R (in SI units ms −1 ). To obtain volumetric fluxes we again perform areal integration over the corresponding area (here the land area A l ).
Rearranging Eq. (2) we obtain an estimate of the water discharging into the Arctic Ocean independent of runoff itself. As we prefer analyzed quantities we can further insert Eq. (1) to substitute P L and ET L , which are derived from short-term forecasts 195 from ERA5, through the analyzed quantities VIWVD L , and atmospheric storage change:

Ocean
Following Bacon et al. (2015) the oceanic mass budget equation can be expressed as The left side of Eq. (4) denotes the change in mass over a closed volume, F surf describes any surface mass input/output in the form of precipitation, evaporation and runoff. The last term of Eq. (4) denotes ice and ocean side-boundary fluxes from or into the volume, with the horizontal sea water velocity c and integration along the along-boundary coordinate s and depth z.
Further, we apply the Boussinesq approximation and assume ρ as constant. We adopt the reference density used in the Nucleus and enter the ocean laterally over its vertical boundaries, described by the last term of the equation (oceanic lateral transport, denoted F). The liquid portion of F is calculated by integrating the cross-sectional velocity component along the side areas of the Arctic boundary. Additionally, we add ice transports, which are calculated analogously by integrating the cross-sectional ice velocity over the grid-point-average ice depth and integrating it over the Arctic boundary. As volume exchange between liquid ocean and sea-ice is conserved in the NEMO model, we additionally remove the liquid water volume that is actually 215 replaced by sea ice, which we call the equivalent liquid water flux. The equivalent liquid water flux at a given grid point is calculated by integrating the liquid volume flux over the grid-point-average ice depth and taking 90% of the result (as only 90% of the icebergs are underwater). As ice velocities from the public CMEMS data portal are only available from two of the ocean reanalyses (ORAS5 and GLORYS2V4) we calculate the ice flux "correction" term for the GREP ensemble by taking the mean of those two products. However, as the impact of the correction is quite similar for ORAS5 and GLORYS2V4 we 220 believe that the correction is accurate enough for the purpose of this study. Changes in ocean density do not affect the volume as the steric effect is missing due to the Boussinesq approximation.
In this paper we mostly present monthly means, derived by averaging the corresponding fields from reanalyses in their native temporal resolution. Horizontal interpolation and vertical interpolation has been avoided by using all reanalysis products in their native grid representation. Care has been taken also to average over the same area for all products as far as this is possible.

225
The lateral volume fluxes through the ocean gateways were evaluated along paths on the native ORCA grid that followed the ARCGATE mooring arrays as closely as possible -ORCA coordinates are given in Table 3. This is essential, since the net lateral volume fluxes in and out of the Arctic are very small (∼0.2 Sv in the annual mean) compared to the fluxes through individual straits (e.g. ∼2.3 Sv for Davis Strait, Curry et al., 2011).

Variational approach for budget closure 230
Given all the various, largely independent data sets we use, closure between the budget terms will not be perfect, resulting in a budget residual. To get rid of any residual and obtain a closed budget with physical terms only, we follow methods by Mayer et al. (2019), L'Ecuyer et al. (2015 and Rodell et al. (2015) and use a variational Lagrange multiplier approach to enforce budget closure on annual and monthly scales. Therefore the following cost function J is minimized: With the Lagrange multiplier λ, the a priori estimates of the budget terms F i , the adjusted budget terms F i , the uncertainty of the respective budget term σ 2 i and the budget residual i F i .

Annual optimization
Inserting the annual means of the individual budget terms into Eq. (6) and differentiation in respect to λ and the a priori estimates of the budget terms, yields eight equations with eight unknowns. Solving the system of equations results in an 240 expression for the adjusted budget terms F k : Hence, the budget residual is distributed across the budget terms according to their relative uncertainty. The a priori uncertainties are derived from the standard deviations of the mean annual budget terms. The a posteriori uncertainties σ 2 k are calculated following :

Monthly optimization
Monthly optimization is performed in two steps. First adjusted fluxes are calculated for each month separately following Eq.
(7), whereat the a priori uncertainty is estimated by taking the maximum of the seasonal standard deviations and is kept fixed throughout all months. However, the annual means of the resulting monthly fluxes do not coincide with the annually optimized 250 fluxes. Therefore we follow Rodell et al. (2015) and apply a second Lagrangian optimization, where the adjusted monthly fluxes from the first step (F k ) are adjusted in relation to their uncertainty, so that their annual mean is equal to the annually optimized fluxes F m : However, the second step again generates small monthly residuals. Therefore the whole procedure is performed iteratively a 255 second time, using the a posteriori uncertainties gained through Eq. (8). This results in the desired monthly fluxes, that satisfy both, a closed budget and consistency with the annually optimized fluxes.

Trend and relative error calculation
We calculate relative, decadal trends following Zsótér et al. (2020) and Stahl et al. (2012) by applying a linear regression to the annual mean time series: The slope of the time series is the annual change obtained through the linear regression and the mean is the long-term annual mean of the timeseries. The multiplication factor 10 results as we calculate trends over a fixed 10-year period. Hence, a trend of e.g. 0.1 is equal to an increase of 10% over a decade. All trends are calculated over the common period of the discharge datasets 1981-2019, except for GloFAS E5L which is calculated over 1999-2018. We do not consider temporal auto-correlation, 265 assuming that subsequent annual means are only weakly correlated, and determine significance using the Wald Test with a t-distribution, where p-values smaller than 0.05 are considered as significant.
To compare river discharge estimates from the various reanalyses to river discharge observations we use the Pearson's correlation coefficient r and a normalised root mean square error (NRMSE), which is calculated by dividing the RMSE through the RMS of the observed values (N RM SE(x)=RM S(x − obs)/RM S(obs)).

4 Results and Discussion
We first discuss seasonal cycles and trends for the four major Arctic catchments -Yenisei, Ob, Lena and Mackenzie. Then we extend our assessments to the Pan-Arctic region, where we compare the total terrestrial Arctic runoff with oceanic volume fluxes through the main gateways. Observations show a distinct runoff peak in June due to snow melt and river ice breakup, and weak runoff through winter. The 280 spring flood season of Eurasian rivers depends on the basin size and usually ends by the end of June at small size rivers and by the end of July or beginning of August at large rivers like Ob, Yenisei and Lena (Yang et al., 2007). While smaller rivers usually exhibit a low-flow season in the summer to fall period, discharge from larger rivers mostly shows a slower decrease, also because of summer rainfalls providing additional discharge water. Especially at rivers of Eastern Siberia (e.g. Lena) intense rainfalls in the summer-fall season may occasionally cause rainfall floods (Shiklomanov et al., 2021a). In late winter, with the 285 maximum of river freeze-up, discharge reaches its minimum.

Analysis of major catchments
Runoff data from ERA5 and consequently also from GLOFAS E5 underestimate the summer peaks recorded by gauges and reach only about 25 to 50% of the observed peak discharge values. In the low flow season the reanalyses slightly exceed observed discharge, however this does not alter the annual means considerably resulting in a clear underestimation of discharge by ERA5 and GLOFAS E5 also in annual terms. The difference between ERA5 runoff and GLOFAS E5 discharge is expected to 290 be caused by two sink terms, a groundwater loss component, calibrated in LISFLOOD, that removes water that is lost to deep groundwater systems, and the open water evaporation component, which also removes water through evaporation over water surfaces in LISFLOOD. The negative contribution can reach up to 20-40% of the average flow by any of the two terms in certain regions (described in the LISFLOOD model documentation, Burek et al., 2013). Better accordance in terms of peak height and annual means is achieved by runoff from ERA5-Land -except for the Lena basin -while the sink terms in GloFAS again 295 causes an underestimation by GLOFAS E5L . At the Ob basin the runoff peak in ERA5-Land, and also ERA5, occurs in May, thus a month earlier than the observed peak. GloFAS discharge is not in phase with ERA5 runoff, but reaches its peak in June, presumably due to the delay by the river routing component. In contrast to GLOFAS E5 and GLOFAS E5L , GLOFAS E5new reaches values similar to observations and agrees best with the observed values in terms of annual means, peak heights and seasonality. The middle panels of Fig. 2 show snowfall and snow melt as well as the atmospheric components net precipitation 300 (P-E) and divergence of moisture flux (VIWVD). Seasonal cycles of P-E and VIWVD minus storage change agree quite well in terms of peak heights and timing, with low moisture inflow and low net precipitation in summer and higher values in autumn and winter. Annual values of P-E and VIWVD-∆S differ by 2-16% depending on the catchment. Seasonal cycles of ERA5 snow melt show that there is a lag of one month between the peak in snow melt and river discharge. This can partly be explained by the time it takes for the water to reach the river mouth and by water resource management effects, but is mostly caused due 305 to delayed river ice break up in the lower parts of the basins. For example, at the upper part of the Ob river, ice breaks up around April to May, while the lower part breaks up between May and June (Yang et al., 2004b). While human impacts through water withdrawals for agricultural use are rather limited compared to rivers in lower latitudes, water resource management via dams and reservoirs can significantly alter the seasonal discharge cycle of the larger Arctic rivers (Shiklomanov et al., 2021a). Especially the Ob and Yenisei basins, but also Lena and Kolyma, are affected by multiple reservoirs using water for hydroelectric 310 power generation and delaying discharge from high-flow periods to the low-flow season (Lammers et al., 2001;Ye et al., 2003;Yang et al., 2004b, a;Shiklomanov et al., 2021a).
The lower panels of Fig. 2 show land storage change from ERA5 and from GRACE. Additionally, ERA5 storage is separated  for GloFASE5L). Shading denotes the ± one standard deviation, µ are the long-term means. Furthermore, geographic coordinaets of gauge observations and catchment areas are given. Long term annual means in mm/day are given in the legend. into its components of soil water change ∆S SW V L and snow depth change ∆S SD . In autumn and winter, land water is accumulated through a rise in snow depth, in summer storage is lost through snow melt and associated runoff, and to a smaller 315 part also through evaporation. GRACE shows a significant annual loss of water storage over the past decades, while storage change in ERA5 exhibits only a slight annual decline. In terms of seasonality GRACE features the largest storage changes in June and July, while ∆S from ERA5 tends to peak 1-2 month earlier. Again this could be caused by delayed river ice breakup and backwater that is observed by GRACE, but not represented in ERA5. In contrast to GLOFAS E5 and GLOFAS E5L , GLOFAS E5new exhibits greatly improved river discharge values and concerning the normalised RMSE values it provides the best results when compared to observations. Both discharge from GLOFAS E5new 335 and estimation through P-E also feature slightly negative trends, only calculation through VIWVD exhibits no, or even slightly positive trends.

Pan-Arctic approach
To get a complete estimate of the total amount of freshwater entering the Arctic Ocean via land, which is needed for subsequent budget calculation, river discharge is calculated over the whole Pan-Arctic area. First we look into the Pan-Arctic seasonal cy-340 cle and afterwards at annual means and long-term trends over the whole Arctic drainage area.
Pan-Arctic freshwater discharge estimates vary significantly between different studies. This comes to a big part from different definitions of the geographic area (Prowse and Flegg, 2000;Shiklomanov and Shiklomanov, 2003) as there is no strict boundary to the south that defines the Arctic and past studies disagree whether to include e.g. Greenland and the Hudson Bay or not. Another reason for discrepancies between different studies comes due to different approaches of discharge evaluation  measurements. Using only in-situ measurements poses several challenges -especially the handling of the large unmonitored areas, that account for about 30-40% of the total drainage area (Shiklomanov et al., 2002). Most studies adopt the method of hydrological analogy (Shiklomanov and Shiklomanov, 2003), a method where total discharge is calculated by extrapolating gauged runoff to the unmonitored rivers and streams. However, hydrological, climatic and land cover conditions between 350 gauged and ungauged areas can differ quite a lot, resulting in inaccurate estimates (Shiklomanov et al., 2021a). Hence, we compare two different estimation methods for observed discharge that are displayed in Fig. 4. For that purpose we consider gauging records from the 24 largest Arctic rivers, that amount for roughly 70% of the total drainage area used in this study.
Pan-Arctic river discharge for the ungauged areas for each month was estimated as follows: Firstly following e.g. Serreze et al. (2006) and applying the method of hydrological analogy for each calendar month by transforming discharge into runoff and, 355 with the assumption that runoff from the ungauged area is the same as runoff from the gauged area, multiplication with the  Table 4. Mean values and standard deviations of observed river discharge as well as relative decadal trends and their standard errors (see Eq. (10)), calculated over the period 1981-2019. Trends that are not significant are marked with an superscript n .
whole drainage area to transform back to river discharge -here denoted as ae (area estimate). However, taking runoff from reanalysis (e.g. ERA5-Land) for the same 24 drainage areas and expanding it over the whole area yields different results, with smaller summer peaks, than taking runoff directly from the total drainage area. Hence, monthly correction factors are calculated from our most reliable products ERA5-Land and GloFAS E5new and by multiplication with those, a more accurate   Fig. 1), raising the total Pan-Arctic storage change to 266 km 3 per year. The largest changes for GRACE water storage occur over the coastal areas of Greenland and the Canadian Arctic Archipelago, further declining trends can be seen over the Arctic islands Svalbard and Novaya Zemlya, as well as over mountainous regions (see

390
Long term means of annual discharge over the coinciding period of all datasets 1999-2018 (1981-2018 in brackets) and the distribution to Eurasia, North America and the CAA can be seen on the left panel of Fig. 6 and long term annual means, relative trends, correlation coefficients and NRMSE values are provided in Table 6. About 75 % of Pan-Arctic runoff come from the as the ± one standard deviation. Furthermore, long term annual means in km 3 yr −1 are given and appropriately the right axis is scaled in km 3 yr −1 . estimate through ERA5's VIWVD features a slight increase of 2%, identically to the trend present in our observation-based estimate. GloFAS E5new comes closest to gauge observations concerning annual means and NRMSE values, while indirectly calculated runoff through VIWVD features the same trend as observations and also the highest correlation concerning annual means. However, the VIWVD estimate generally yields a roughly 5% lower discharge compared to observations.

410
The strong decreases in ERA5 runoff are reinforced through two discontinuities in the dataset, one around 1992 and the second one around 2004, that lead to a significant drop and hence a clear underestimation of runoff over the past decades. As GLOFAS E5 takes ERA5 as input, it also adopts those discontinuities. In contrast, ERA5-Land does not exhibit such breaks, suggesting that the error may come from the data assimilation system in ERA5. While the discontinuity around 2004 was traced back to the introduction of the IMS (Interactive Multisensor Snow and Ice Mapping System) snow product into the Pan-Arctic river discharge /wo Greenland assimilation system (Zsótér et al., 2020), the reason for the break around 1992 could not be identified yet and is discussed further in the following section.
Contrary to discharge estimates from reanalyses, observations show opposing trends and a rise in river discharge over the past   shows annual means for the individual Greenlandic discharge components and Table 7 displays mean values and trends. Liquid 435 ice discharge exhibits a vast positive trend of 26% per decade and also solid ice discharge shows a clear rise of about 8% per decade. It can be expected that Greenland discharge will further increase in the future (Muntjewerf et al., 2020;Church et al., 2013;Vaughan et al., 2013, e.g.) Adding Greenlandic discharge to our Ge observation estimate yields a total (pan-Arctic plus Greenland) discharge of 4423 km 3 yr −1 . A recent assessment (Shiklomanov and Lammers, 2013;Shiklomanov et al., 2021a) estimates the total discharge to 440 the Arctic Ocean at approximately 4300 km 3 yr −1 for the period 1936-2006. Differences may stem from the use of different data products, from slight variations in discharge area -Shiklomanov and Lammers (2013) use an area of approximately 19 × 10 6 km 2 , while our area together with the considered Greenlandic area (0.95 × 10 6 km 2 ) sums up to 19.15 × 10 6 km 2 -and from different calculation periods -discharge was very likely smaller in the mid-20th century.

ERA5 Runoff discontinuities
As mentioned above, a possible reason for the negative shifts in ERA5 runoff lies in the data assimilation system and its removal of soil moisture. Zsótér et al. (2020) assess the GLOFAS E5 river discharge reanalysis as well as ERA5 and ERA5-Land runoff and compare them with available river discharge observations. Similarly to our diagnostics for Arctic rivers, Zsótér et al.
(2020) find river discharge decreases in GloFAS E5 for several rivers of the world, that are not supported by observations. They find trends in tropical and subtropical areas being driven by changes in precipitation, while changes in snow melt have a very 450 strong influence on river discharge trends in the northern latitudes. Thus, the runoff decreases in the northern high latitudes are likely linked to snow assimilation and other processes related to snow melt. Figure 7 shows ERA5 snowfall, snow melt and the sum of snow melt and snow evaporation as well as the corresponding parameters from ERA5-Land for the whole Pan-Arctic region, Eurasia, North America and the CAA. ERA5 shows substantial differences between snow gain and snow loss, and Pan-Arctic snow melt exhibits the same discontinuities around 1992 and 2004 as runoff from ERA5. In contrast, snow gain and 455 snow loss are in balance for ERA5-Land. Zsótér et al. (2020) find similar results and attribute the differences to the land data assimilation, that has impacts on snow and soil moisture. The discontinuity in 2004 was traced back to a change in operational snow analysis, through the introduction of the 24-km Interactive Multi-Sensor Snow and Ice Mapping System (IMS) snow cover information to the snow assimilation system in 2004 (Zsótér et al., 2020). While the discontinuity around 2004 is present in the Eurasian watershed as well as in North America, the discontinuity around 1992 is only present in Eurasia. Figure 8   460 shows the spatial distribution of the snow melt discontinuities, calculated as differences between snow melt climatologies from 1979-1991 and 1992-2003, as well as 1992-2003 and 2004-2019. The latter difference exhibits spurious signals at the coastal mountain range of Alaska, the Rocky Mountains and at mountainous regions in Siberia, the prior discontinuity spots rather random signals only over Eurasia.
This discontinuity issue is not only limited to ERA5, but rather a general issue in reanalyses, as observation platforms are 465 changing through time, making it practically very hard to make these products perfectly homogeneous in time. Especially satellite data were not available in the early days and were introduced with the development and introduction of new instruments. If the redundancy is large enough, then any discontinuity impact should be less pronounced. However, specifically for snow there is only the IMS product that was introduced in 2004, and hence any inhomogeneity generates a larger impact. Thus, this impact could possibly be reduced when using other data sets on top of the IMS product or instead of it, which ideally go 470 further back in time.

Comparison with oceanic fluxes
In this section we look at the oceanic volume budget as described in Eq.  The GREP ensemble shows an annual mean lateral transport out of the Arctic region of 203 ± 48 mSv, with ORAS5 being on the higher end of the ensemble with an annual outflow of 232 ± 48 mSv and a summer peak of 578 mSv. The seasonal cycles of the oceanic volume transports resemble the seasonal cycles of the freshwater input and peak in June, as the ocean reacts 480 almost instantaneously to surface freshwater input and generates barotropic waves that lead to mass-adjustment within about a week (Bacon et al., 2015).
In addition to the oceanic transports the left panel of Fig. 9 shows the forcing terms involved in the generation of the ORAS5 fluxes: In addition to net precipitation from ERA-Interim and river discharge from the BT06 climatology, we need to add a surface salinity damping term (damp ORAS5 ), which represents an additional non-physical surface freshwater forcing in the ocean reanalyses (last term of Eq. 6.1 in Madec et al., 2019). Additionally, a freshwater adjustment term (FW adj ) should be added due to the assimilation of global mean-sea-level changes. This term is not archived in ORAS5 and hence is not considered in our analysis (therefore it is marked grey in Eq. (11)). However, we deem the freshwater adjustment term as a rather small source 490 of error and nevertheless would expect good closure concerning Eq. (11). Indeed, with an annual deviation of 11 mSv and an NRMSE value of 0.19 the accordance between F ORAS5 and its freshwater input estimate is reasonable. In addition to the FW adjustment term, another cause for the discrepancies between forcing and computed volume fluxes is that the ocean reanalyses compute their own turbulent air-sea fluxes (REF) and do not use that from ERA-Interim. However, given the generally low values of sensible and latent heat fluxes in the Arctic we consider this a moderate source of error.

495
The right panel of Fig. 9 shows various estimates of freshwater input minus oceanic storage change. To estimate the atmospheric freshwater input over the ocean we take net precipitation from ERA5 as well as VIWVD from ERA5, ERA-Interim, the Japanese 55-year Reanalysis JRA55 and JRA55-C. For the reanalysis-based estimates we use the river discharge reanalyses we have most confidence in, namely GloFAS E5new , ERA5-Land runoff and the indirect estimates through P-ET and VIWVD minus storage change. Land storage is only derived from GRACE, while oceanic storage is taken from GRACE and from 500 ORAS5. GRACE and ORAS5 show quite similar seasonal cycles, with mass being accumulated by the ocean in summer and released in winter, most likely caused by the seasonal variation of wind stress curl and seasonal changes in Ekman pumping (Bacon et al., 2015). In terms of annual trends both ORAS5 and GRACE indicate a slight increase of mass over the past decades. The observation-based estimates are calculated using river discharge from our observatiom-estimates (ae, Ee, Ge) and oceanic storage change from GRACE. The atmospheric components VIWVD and atmospheric storage change over the ocean 505 are still taken from reanalyses (ERA5, ERA-Interim, JRA55, JRA55c), as we lack the corresponding observations. Greenlandic discharge is taken from Mankoff et al. (2020a, b). Calculating the mean over our reanalysis-and observation-based estimates we obtain annual values of 207 ± 4 mSv and 208 ± 3 mSv, respectively. Thus, compared to the GREP volume flux we obtain an imbalance of 4 − 5 mSv, representing less than 3% of the physical fluxes.   Table 8. 1993-2018 ( * 2004-2010) long term means, NRMSEs and correlation coefficients of oceanic volume transports from GREP, ORAS5 and ARCGATE, as well as estimates derived from the forcing terms used in ORAS5 (as described in Eq. (11)). Additionally, our reanalyses and observation based estimates are given. NRMSE values and correlation coefficients are calculated using monthly means with respect to GREP( 1 ) and ORAS5( 2 ). Units are milli-Sverdrup (mSv).
to river icing and low precipitation values, however freshwater that entered the Arctic in the past season and moves slowly with the ocean currents still finds its way through the Arctic gateways. In June, the large freshwater peak generates barotropic waves that are also seen in the reanalysis volume transports. Residuals arise from different river discharge estimates and are generally positive for reanalysis-and negative for observation-based estimates. In late summer, river discharge tends to decline 515 and precipitation takes over the main role of delivering freshwater to the ocean. Oceanic reanlyses show a fast decline after the June peak and exhibit a volume transport minimum in September leading to a negative residual peak in all estimates. We assume that volume that enters the Arctic as freshwater through precipitation is slowly transported with the ocean currents and modified by wind stress, partly freezes on the way and takes weeks up to months to leave the Arctic area, explaining the elevated winter and spring transports. 520 Table 8 shows annual means of the various transport estimates, standard deviations and NRMSE values. The normalised RMSE values are calculated with respect to GREP ( 1 ) and ORAS5 ( 2 ), indicated by the superscripts. In addition to the estimates from

Volumetric budget closure
To close the volumetric budget and get rid of the small residual we use a variational adjustment procedure (see Sect. 2.1.2).
A priori estimates are calculated by taking the mean over all trustworthy estimates of the individual budget terms -given in 535 Table 9. We perform the adjustment on long term annual means to close the annual budget and on the monthly climatologies of the individual terms to close the budget on a monthly scale. The uncertainties for the annual optimization are estimated as the standard deviations between the various estimates (see Table 9) of the mean annual budget terms. The uncertainties for the monthly optimization are estimated by taking the annual maximum of the monthly standard deviations, indicated by the dashed lines in Fig. 11. Oceanic lateral volume transport F features the largest uncertainties and hence also experiences the 540 largest adjustments amongst all budget terms.  (Volkov and Landerer, 2013;Fukumori et al., 2015) and circulation patterns and exhibits strong nonseasonal fluctuations, further complicating the detection of long-term oceanic mass trends. Volkov and Landerer (2013) find that an intensification of the westerly winds over the North before and after the adjustments. A metric to identify whether the variational adjustment is successful is the comparison of the adjustments of the terms to their respective a priori uncertainty (L'Ecuyer et al., 2015), hence Fig. 13 also shows the spreads of the a priori estimates (shaded areas). Due to the large differences between oceanic volume transports and freshwater input terms in late winter to early spring and in September, both F and runoff R feature adjustments larger than their a priori spreads, suggesting that the a priori uncertainties are larger than estimated. A possible explanation is that systematic biases are not taken 575 into account. The state-of-the art reanalyses exhibit systematic errors in their runoff seasonalities, as the seasonal runoff peaks in summer are too low in comparison to observations, while winter and spring values are too high. Due to the lack of reliable seasonal observations of the oceanic volume fluxes, it is hard to define systematic biases in the ocean reanalyses. However, all four ocean reanalyses feature quite low September volume fluxes, which are not found in their forcing components (see figure 9). Uotila et al. (2019) assess ten ocean reanalyses, including CGLORS, FOAM, GLORYS and ORAS5, specifically 580 in the polar regions and find multiple systematic errors concerning sea ice thickness and extent, temperature profiles, mixed layers as well as ocean transports. Seasonal cycles of volume transports were not assessed, however seasonal cycles of sea ice components and heat transports did exhibit systematic errors. Further analyses would be necessary to come up with robust estimates of the bias in seasonal volume transports.  -We estimate Pan-Arctic river discharge from gauge observations using monthly correction factors from GloFAS E5new , as the common method of hydrological analogy (e.g., Shiklomanov and Shiklomanov, 2003) tends to underestimate the high flow summer peaks (see Fig. 4), and obtain a long term annual flux of 4031 ± 203 km 3 (excluding Greenland).
To compare our results to past studies we adapted the time periods and areal extents accordingly and found reasonable 600 accordance with Haine et al. (2015), who combined runoff from ERA-Interim with river discharge observations and obtained a total discharge about 5% higher than our estimate. An even better agreement was found with the estimates made by Shiklomanov et al. (2021a), as the total Pan-Arctic discharges (including Greenland) agree within 2%.
-Runoff from ERA5 is underestimated compared to observed discharge, with the largest discrepancies occurring in the high flow summer months, and features strong declines of 10±1% per decade on a Pan-Arctic basis and even stronger 605 declines at the major catchments. These strong declines are caused by two inhomogeneities (1992 and 2004) in ERA5's snow melt time-series and contradict the discharge increases found in gauge observations. Those inhomogeneities are caused by a loss of snow through changes in the data assimilation system. While the discontinuity around 2004 was traced back to the introduction of the IMS snow cover information to the snow assimilation system, the 1992 inhomogeneity still needs further investigation.

610
ERA5-Land does not feature these inhomogeneities and exhibits more moderate runoff declines of 5-6±2% for the Eurasian major watersheds and 2±1% for Mackenzie river and on the Pan-Arctic scale. These declines in ERA5-L runoff are caused by similar declines in P-E from ERA5, as P-E is used as a forcing in ERA5-L (see figure 6 and table 5). As observations agree on an increase of river discharge, these declines are deemed unrealistic. An improvement may possibly be achieved when taking the divergence of moisture flux (VIWVD) as forcing, as VIWVD, which is computed 615 from analysed fields rather than short-term forecasts, features similar trends as discharge observations.
-Calculating runoff through the divergence of moisture flux is the only reanalysis-based estimate that exhibits a slightly positive trend of 2 ± 1% per decade and thus features the best agreement with observations in terms of trends and the highest correlation coefficient. However, VIWVD tends to underestimate runoff by roughly 5%.
-River discharge from GloFAS E5 reflects the variation and the inhomogeneities from ERA5 and shows an additional 620 negative shift due to two sink terms removing water in LISFLOOD that lead to a discharge underestimation of up to 50% towards the end of the time series. In contrast, GloFAS E5new entails a considerable improvement and reproduces the observed values best in terms of annual means and NRMSE values.
-Liquid and solid discharge from Greenland account for roughly 10% of total Pan-Arctic discharge and hence should not be neglected in assessments of the Arctic freshwater budget. Liquid discharge features a vast increase of 20±4% per 625 decade and solid discharge an increase of 8±1%. Due to glacial melting also land storage changes are particularly strong over the Greenlandic ice sheet, accounting for roughly half of total Pan-Arctic land storage change.
-Comparing the estimates of freshwater input into the Arctic Ocean that we have most confidence in after the preceding analysis (listed in table 7), to oceanic volume transports through the Arctic gateways computed from ocean reanalysis yields only a small imbalance of less than 3% in terms of annual means.

630
-The variational adjustment procedure provides a best estimate of every budget term for every calendar month -listed in Tab. 10. About two thirds of the freshwater provided for oceanic volume transports come from runoff (4379±25 km 3 per year) and about one third is provided by the atmosphere. Land areas show a strong decline (−246 km 3 yr −1 ) of water storage over the past decades, while oceanic storage features only a slight increase. This leads to a surplus of roughly 220 km 3 per year of more water leaving the Arctic area than entering.

635
In summary, we refined past Arctic water budget estimates (e.g., Serreze et al., 2006;Dickson et al., 2007) and their uncertainties by combining some of the most recent reanalyses data-sets and observations, and by applying a variational optimization scheme. The variational adjustment worked very well on an annual scale and brought reliable estimates of the volume budget terms, requiring only moderate adjustments of less than 3% for each individual term. Adjustments are considered reliable if budget closure is achievable within the respective terms error bounds and if the terms are comparable to estimates from past 640 studies. With an annual value of 4379 ± 25 km 3 (calculated over 1993-2018), our adjusted runoff estimate is slightly higher than estimates made by Shiklomanov and Lammers (2013) and Shiklomanov et al. (2021a) for the period 1936-2006. However, considering the different calculation periods and assuming a decadal rise of roughly 2%, the estimates are considered to be in good agreement. On a seasonal scale however, stronger adjustments were needed to close the budget, and some of the adapted freshwater and volume fluxes fell out of their a priori uncertainty range, suggesting an underestimation of the specified 645 uncertainties. The latter is very likely caused by the presence of systematic errors being present in the data sets, or at least in their seasonal cycles, that are not taken into account in our a priori uncertainty estimates. Especially when calculating Pan-Arctic runoff, caution is needed. Our results show that seasonal peaks of river discharge are underestimated in almost all of the assessed reanalyses (ERA5, ERA5-Land, GloFAS E5 , GloFAS E5L ). The biggest errors are caused by inhomogeneities in the data assimilation system (ERA5 and GloFAS E5 ) which led to a great underestimation of runoff, especially in the latter half of 650 the time series. However, also reanalyses without data assimilation (ERA5-Land and GloFAS E5L ) were not able to reproduce the seasonal cycle of river discharge accurately. On the other hand we find distinct improvements in the new GloFAS E5new product, especially when investigating seasonal cycles and long term means it features considerable enhancements compared to its precursors. However, for interannual variability and trend analysis we recommend the use of the VIWVD estimate, as it reproduces trends from gauge observations more accurately than the other estimates.

655
When extrapolating observed river discharge to the whole Pan-Arctic area we found that the common method of hydrological analogy tends to underestimate the discharge peaks. We therefore advise to use river discharge observations where available and reliable runoff/discharge estimates from reanalyses (e.g., GloFAS E5new or ERA5-Land) to extrapolate discharge to the ungauged areas.
A further possible reason for inconsistencies between runoff and ocean reanalyses is the usage of climatological river discharge 660 data to specify land freshwater input in ocean reanalyses. Our analyses show that the seasonal cycle of the ORAS5 runoff climatology BT06 fits well to our observation based estimates (see Fig. 5), however the lack of inter-annual variability in the freshwater input adversely affects i.a. oceanic volume transports. We note that Zuo et al. (in preperation) work on implementing a time-varying land freshwater input, derived from discharge data from GloFAS, into ORAS5. This should further reduce the inconsistencies between runoff and oceanic volume fluxes from ocean reanalyses.

665
To further refine the budget estimates, longer time series of all budget terms would be needed. For example, one could repeat the analysis using the back extension of ERA5 which goes back to 1950. There is also a new bias-corrected ERA5 data set (WFDE5, Cucchi et al., 2020), that could be examined in terms of the Arctic water budget. Further, it would help to include a precipitation observation data set, preferably one that combines available satellite-based and gauge-based data sets.
Concerning estimation of biases in ocean reanalyses, one could in principle draw on information from oceanographic data 670 for comparison. A main difficulty with oceanographic data is the generally limited temporal and spatial coverage. Nevertheless, the unique form of the Arctic Ocean (as water leaves and enters only through a handful of gateways) allows relatively easy measurements of the in-and outgoing fluxes. As an example, measurements from arrays of moored instruments (like e.g. Acoustic Doppler Current Profilers, MicroCAT -CTD Sensors and Seagliders) were taken to estimate transports through the Arctic gateways using a mass-consistent framework (Tsubouchi et al., 2012. Our results however showed that the moored instruments did not measure the velocity field accurately enough to resolve the barotropic wave signal arising from temporally varying runoff (Tsubouchi, 2019) leading to errors in the seasonality of the net volume flux. A longer measuring period with an even denser monitoring network could help with this aspect.
Code and data availability. ERA5 and ERA5-Land data are available on the Copernicus Climate Change Service (C3S) Climate Data Store (Hersbach et al., 2019;. GloFAS river discharge data are also available on the Copernicus Climate Change Service