Data assimilation techniques that integrate available observations with snow models have been proposed as a viable option to simultaneously help constrain model uncertainty and add value to observations by improving estimates of the snowpack state. However, the propagation of information from spatially sparse observations in high-resolution simulations remains an under-explored topic. To remedy this, the development of data assimilation techniques that can spread information in space is a crucial step. Herein, we examine the potential of spatio-temporal data assimilation for integrating sparse snow depth observations with hyper-resolution (5

Three different experiments were performed to showcase the capabilities of spatio-temporal information transfer in hyper-resolution snow simulations. Experiment I employed the conventional geographical Euclidean distance to map the similarity between cells. Experiment II utilized the Mahalanobis distance in a multi-dimensional topographic space using terrain parameters extracted from a digital elevation model. Experiment III utilized a more direct mapping of snowpack similarity from a single complete snow depth map together with the easting and northing coordinates. Although all experiments showed a noticeable improvement in the snow patterns in the catchment compared with the deterministic open loop in terms of correlation (

Covering nearly half the land surface of the Northern Hemisphere every year

Snowpack monitoring is a difficult task, particularly in remote regions where meteorological conditions are often severe and logistics are challenging

Earth observation satellites provide information on various snow-related variables such as the snow cover extent, snow depth, snow surface temperature or albedo. However, remote sensing observations are limited by revisit times, spatial resolution, cloud obstruction, viewing geometry and spectral resolution

Data assimilation (DA) has emerged as a promising method for enhancing uncertain numerical snowpack simulation results using available in situ or remotely sensed observations

Despite the aforementioned issues, the question of spatial information transfer has thus far received relatively little attention from land surface modellers in general

Thanks to developments fuelled by operational numerical weather prediction

Despite receiving considerably less attention than its temporal counterpart, some examples of spatio-temporal snow DA can be found.

The aforementioned spatio-temporal snow DA studies were typically performed at moderate resolution, in semi-distributed geometries, and/or using relatively simple snow models. In addition, the quantification of the spatial relationships between cells was typically derived from the Euclidean distance. Including a measure of the elevation proximity between cells helped to account for large differences in SWE for cells that were close in the horizontal dimension but located at different elevations

This work is based on the data available from a time series of 12 hyper-resolution (5

We used these maps to generate sparse observations to be assimilated by randomly selecting 20 cells among all the available grid cells for every map. The complete maps were also used to evaluate the posterior simulations. We assumed that the snow depth maps were an independent source of evaluation given that the assimilated observations only represent 0.11 % of the 18 442 simulated grid cells. The random draw of 20 cells was performed independently for each map, emulating a real case where an observer makes sporadic snow depth probe measurements throughout the snow season. The random sampling led to the selection of several snow-free cells, because many snow surveys were conducted late in the snow season (Fig.

Location of the Izas experimental catchment in the central Spanish Pyrenees

We used a meteorological forcing dataset that was previously generated by

This meteorological forcing was used to drive the Multiple Snow Data Assimilation System

Data assimilation is a term used in the geosciences for the ubiquitous exercise of combining models with observations

In this work we focus on ensemble Kalman methods which lend themselves well to spatio-temporal DA thanks to their Gaussian properties and the compatibility with localization methods, as is clear from existing practices in the broader DA community

In spatio-temporal DA, information from observations can be spread across multiple grid cells through non-local observations, correlated observation error or prior dependence

Monte Carlo methods in general and ensemble Kalman methods in particular require the use of an ensemble (i.e. a collection) of model realizations. To generate the prior ensemble of simulations, we used time invariant (within a water year) prior parameter ensemble to perturb the precipitation (multiplicative) and air temperature (additive) forcing variables. To bound these parameters within certain limits while satisfying the Gaussian prior assumption of ensemble Kalman methods, they were drawn from logit-normal distributions as outlined in

Example of correlation values (

Each of the transformed prior perturbation parameters were drawn from independent high-dimensional multivariate normal distributions (i.e. in the transformed space) by constructing prior spatial covariance matrices. This prior dependence structure allows for associations between the parameters in all the grid cells in our domain, which is key for information propagation as outlined in Appendix

Based on the correlations calculated from the GC function, we construct a correlation matrix

To perform the spatio-temporal DA, we employed the deterministic ensemble smoother with multiple data assimilation (DES-MDA) scheme introduced by

DES-MDA with domain localization.

Run

Store the ensemble of parameters for this grid cell

Store the

Store the corresponding local predicted observations in the

Build the localization matrices

Compute the

Compute the

Compute the

Update the ensemble mean

Update the ensemble anomalies

Combine the mean and anomalies to obtain the updated ensemble

Herein, the DA window is 1 water year and the parameters are updated independently for each water year. This DES-MDA is a deterministic version of the original (stochastic) ensemble smoother with multiple DA

In the modelling pipeline, a crucial step is the determination of the distances between grid cells in the simulation domain. This is typically accomplished through the calculation of the pairwise Euclidean distance between cells. The Euclidean distance between two cells is the Euclidean norm

As explained in the Introduction, we aim to incorporate topographical dimensions in our distance calculations. Thus, it is necessary to account for the differences in the units and the potential correlation between these additional dimensions. To do so, we employ the Mahalanobis distance

In this study, we propose three experiments using different distances to construct the prior covariance and explore the potential for spatio-temporal snow DA at hyper resolution:

Experiment I: The prior covariance was constructed using the Euclidean distance in a two horizontal dimensions (easting and northing) space as is typically done in 3D land DA

Experiment II: The prior covariance was constructed using the Mahalanobis distance in a high-dimensional space that includes three spatial dimensions (easting, northing and elevation) along with four additional topographic dimensions described below.

Experiment III: The prior covariance was constructed using the Mahalanobis distance in a space composed of two horizontal dimensions and one snow depth dimension based on a snow depth map obtained early in the water year (14 January 2020).

In Experiment II, the topographic parameters that define the additional dimensions are the topographic position index (TPI, with a search distance of 25

All the spatio-temporal DA experiments were developed using MuSA. In fact, these new capabilities that we test in these experiments are packaged as an updated version of MuSA where spatio-temporal DA can be activated optionally while preserving the previous capabilities. A variety of different ensemble-based DA schemes were implemented in the original version of MuSA. For this study, we also added the deterministic ES-MDA

Several modifications of the MuSA code were necessary to implement spatio-temporal snow DA capabilities. In the original version of MuSA, each grid cell in the simulated domain was updated independently. This was due to the fact that both FSM2 and the DA schemes were purely temporal, in the sense that spatial grid cells did not interact, resulting in what is known in computer science as an embarrassingly parallel problem since it makes parallelization relatively trivial. However, spatio-temporal DA requires each grid cell to have access to both the observations and ensemble of predicted observations from nearby cells. What constitutes nearby will depend on the distance metric used (i.e. Eqs.

The MuSA framework was originally designed to be a noninvasive Python wrapper around the supported numerical snowpack model (FSM2) which simplifies implementing updated versions of this model and even altogether new models of any type. Staying true to this philosophy, the spatial propagation of information was handled using physical disk input/output (I/O) operations. This was designed also with the intention of alleviating the cost of storing many ensemble simulations in memory. Such a cost is possibly prohibitive in applications with a higher spatial density of observations, since each cell would have to store in memory the ensemble members for all the observed grid cells in its surroundings. Nonetheless, the computational problem became significantly more complex in terms of generating potential bottlenecks due to intensive I/O use. To overcome this problem we improved the performance of several internal routines by decreasing the numerical precision of many variables whenever possible and compressing the binary objects to be shared by I/O operations using the high-performance compressor Blosc

The results of the three proposed experiments were evaluated using different strategies. A small percentage of the available grid cells (0.11 % of each map) were included in the assimilation, and therefore the evaluation is essentially performed with independent data despite being compared with the drone data themselves. Even so, to ensure completely independent evaluation, the few cells included in the assimilation were not included in the evaluation metrics for the respective experiments.

As a first step, we compared the spatial patterns of snow depth from the simulations with a complete snow depth map retrieved close to peak SWE (11 March 2020). Different metrics were used to estimate the performance of the different experimental setups. We computed the cell-by-cell difference (error) between the reference map and the posterior mean of the ensembles. For all non-probabilistic metrics and visualizations, we always used the posterior ensemble mean as the point estimates that we evaluate. To measure the performance of the posterior ensembles, we computed the cell-by-cell continuous ranked probability score

To visually gauge the overall performance of the experiments, we generated scatter plots showing the simulated versus the observed snow depth for all grid cells in the domain. In addition, we computed two commonly used evaluation metrics: the root mean square error (RMSE) and the linear correlation coefficient (

Experiment I results on 11 March 2020.

The deterministic open loop (OL) model run was not able to reproduce the intricate spatial patterns of snow depth observed in the drone surveys (Fig.

Experiment II results on 11 March 2020 presented in the same way as those of Experiment I in Fig.

In all spatio-temporal DA experiments, both the distance-based prior covariance matrix and the localization had a strong impact on the DA performance. The results from Experiment I in Fig.

The use of additional dimensions based on topographic parameters in the construction of the prior covariance matrix in Experiment II had a markedly positive impact on the performance of the posterior inference (Fig.

Experiment III results on 11 March 2020 presented in the same way as those of Experiment I in Fig.

Summary of validation statistics for each of the snow depth drone maps and the mean for all the seasons. The highlighted values indicate the best performance value for each date.

As expected, the posterior results in Experiment III, where a single drone-based snow depth map (14 January 2020) was used to construct the prior covariance, shows by far the most promising results relative to drone observations (11 March 2020), as shown in Fig.

Comparison of snow depth semivariograms on 11 March 2020 near peak accumulation.

Figure

Time evolution of the total volume of snow in the catchment in units of cubic hectometres (10

The improvements in the simulated spatial patterns that we saw near the peak snow depth date for the various experiments is also reflected in the temporal evolution of the total snow volume in the catchment. Figure

In this work, we investigated the capability of propagating information from sparse observations of the snowpack in hyper-resolution simulations through ensemble-based spatio-temporal DA techniques. We performed three experiments in which we assimilated sparse observations from the Izas experimental catchment. The observations were obtained through random sampling of 20 points from each of the 12 available snow depth maps during the 2020 water year. This set-up was designed to mimic the typical sparse manual sampling of a catchment to test the possibility of propagating this information in distributed snowpack simulations. It should be noted that the selection of grid points for sampling is completely random, and different from one date to another, which may result in many measurements not being as informative as they could be. Additionally, most of the snow depth maps are concentrated on the end of snow season, at which point a significant portion of the snow has already melted (Fig.

The three experiments that we carried out herein reflect different strengths and weaknesses of spatio-temporal DA techniques when applied to the hyper-resolution scales for snowpack simulation. These three experiments were designed as a sample of the potential of the techniques, rather than to find the optimal set-up which would likely be highly problem-dependent and involve considerable hyperparameter tuning. The configuration used in Experiment I was not able to reproduce the complex spatial patterns present in the drone-based snow depth map near the snow accumulation maximum, although the simulated evolution of the total snow volume was similar to that of the other two DA experiments and equally close to the observations. Despite this weakness, the simplicity of the configuration of Experiment I is a key advantage over the more sophisticated prior covariance modelling experiments. Constructing the prior covariance based solely on the (horizontal) geographic distance between cells allows for a much more intuitive configuration of the correlation length scale

The incorporation of other dimensions than the geographical easting and northing in the distance-based prior covariance and localization markedly improves the results of Experiments II and III both compared with the OL and Experiment I. Experiment II has demonstrated (see Fig.

The results obtained in Experiment III (see Fig.

The spatio-temporal DA techniques that we explored herein also have wider implications. An immediate possible operational application could be to integrate information obtained from the typically sparse national snow-monitoring networks into high-resolution distributed physically based snow simulations, building on the work of

A particularly promising potential application is the assimilation of snow depth acquisitions from the ICESat-2 laser altimeter

Despite the promising results, new configuration and numerical challenges arise when a high-dimensional space is used to define the pairwise distance between cells. For example, the Mahalanobis distance in Eq. (

In this study, we explored the potential of spatio-temporal DA methods for updating hyper-resolution simulations of the snowpack in an experimental catchment situated in the Pyrenees. Three different experiments were proposed and executed, each employing a distinct prior covariance modelling strategy for assimilating sparse snowpack observations that were subsampled from drone-based snow depth maps. The assimilated data consist of 20 randomly selected snow depth measurements obtained from the

The three proposed experiments are essentially built on the same underlying DA scheme (Algorithm

It should be noted that setting up the better performing Experiments (II and III) can entail considerable technical difficulties. In particular, the use of generalized (rather than simpler geographic) multi-dimensional distances to construct the prior covariance matrix and perform localization leads to a less intuitive experimental design. Notably, the performance of these experiments is sensitive to the choice of hyperparameters, especially the correlation length scale

Herein, we outline the role that prior dependence plays in propagating information from local observations. In Sect.

DA can be formalized as an exercise in Bayesian inference

To make the above general derivation more concrete, we consider a specific toy example in the form of a simple model where a scalar variable of interest

Since the model is linear with a Gaussian prior and Gaussian observation error, the posterior is also Gaussian. Thereby, the mean will coincide with the mode which happens to be the unique point at which the gradient of a Gaussian vanishes. One simple way to obtain the posterior mean

The point of deriving the exact posterior for this simple toy bivariate model is to demonstrate the importance of prior dependence, which in the case of the Gaussian prior is controlled through the prior correlation

For the more general case of prior dependence obtained when

Simple Gaussian linear example with an uncorrelated prior

As in Fig.

The MuSA code can be found at

Conceptualization was by EAG and KA. Data were curated by EAG. Formal analysis was undertaken by EAG and KA. Funding was acquired by EAG and KA. Investigation was undertaken by EAG and KA. Methodology was developed by EAG, KA, MM and PL. Project administration was the responsibility of EAG and KA. Resources were the responsibility of NP, DT and SW. Software was the responsibility of EAG with key contributions by KA. Supervision was carried out by NP, DT, SW, JILM and SG. Validation was performed by EAG. Visualization was developed by EAG and KA. Writing – original draft preparation was lead by EAG and KA, with contributions from all co-authors. Writing – review and editing was result of the common effort of all co-authors.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

This joint research effort was initiated through the JASPER project (no. 337515) funded by the Research Council of Norway. Esteban Alonso-González has been funded by the CNES postdoctoral fellowship. Kristoffer Aalstad and Norbert Pirk were funded by the Research Council of Norway through the Spot-On project (no. 301552) and acknowledge support from the LATICE Strategic Research Initiative at the University of Oslo. Marco Mazzolini and Désirée Treichler acknowledge funding from the Research Council of Norway (SNOWDEPTH project, contract 325519). This study was partially funded by the project SNOWDUST (TED2021-130114B-I00) and MARGISNOW (PID2021-124220OB-100), both funded by the Spanish Ministry of Science and Innovation.

This research has been supported by the Norges Forskningsråd (grant nos. JASPER 337515, SNOWDEPTH 325519, and Spot-On 301552), the Centre National d'Etudes Spatiales (grant no. Postdoctoral fellowship 0103207), and the Ministerio de Ciencia e Innovación (grant nos. MARGISNOW PID2021-124220OB-100 and SNOWDUST TED2021-130114B-I00).

This paper was edited by Jan Seibert and reviewed by two anonymous referees.