Canada's water cycle is driven mainly by snowmelt. Snow water equivalent (SWE) is the snow-related variable that is most commonly used in hydrology, as it expresses the total quantity of water (solid and liquid) stored in the snowpack. Measurements of SWE are, however, expensive and not continuously accessible in real time. This motivates a search for alternative ways of estimating SWE from measurements that are more widely available and continuous over time. SWE can be calculated by multiplying snow depth by the bulk density of the snowpack. Regression models proposed in the literature first estimate snow density and then calculate SWE. More recently, a novel approach to this problem has been developed and is based on an ensemble of multilayer perceptrons (MLPs). Although this approach compared favorably with existing regression models, snow density values at the lower and higher ends of the range remained inaccurate. Here, we improve upon this recent method for determining SWE from snow depth. We show the general applicability of the method through the use of a large data set of 234 779 snow depth–density–SWE records from 2878 nonuniformly distributed sites across Canada. These data cover almost 4 decades of snowfall. First, it is shown that the direct estimation of SWE produces better results than the estimation of snow density, followed by the calculation of SWE. Second, testing several artificial neural network (ANN) structural characteristics improves estimates of SWE. Optimizing MLP parameters separately for each snow climate class gives a greater representation of the geophysical diversity of snow. Furthermore, the uncertainty of snow depth measurements is included for a more realistic estimation. A comparison with commonly used regression models reveals that the ensemble of MLPs proposed here leads to noticeably more accurate estimates of SWE. This study thus shows that delving deeper into ANN theory helps improve SWE estimation.

The works published in this journal are distributed under the Creative Commons Attribution 4.0 License. This license does not affect the Crown copyright work, which is re-usable under the Open Government Licence (OGL). The Creative Commons Attribution 4.0 License and the OGL are interoperable and do not conflict with, reduce or limit each other. The co-author Camille Garnaud is an employee of the Canadian Government and therefore claims Crown copyright for the respective contributions. © Crown copyright 2021

Snowmelt plays a major role in the hydrological cycle of many regions of the world.

Consequently, many hydrological models include a snow module to estimate the snow water equivalent (SWE). SWE is of great interest in hydrology because it describes the volume of water stored in the snowpack (e.g.,

Snow depth can be measured inexpensively by ultrasonic distance sensors. Furthermore, light detection and ranging instruments (lidar) installed on aircraft can measure snow depth remotely

A number of regression models have already been proposed to convert snow depth to SWE

Physics-based approaches have also been proposed for converting snow depth to SWE. For instance,

Recent studies have suggested the use of artificial neural networks (ANNs) to estimate SWE.

In this study, we build a model to estimate SWE from in situ snow depth measurements and several indicators derived from gridded meteorological time series. This study is a follow-up to the work of

The remainder of the paper is organized as follows. In Sect.

Following

The goal is to approximate a desired function

To determine the parameters, one can perform an optimization over the space of the weights and biases by using a training data set. To do so, the parameters are initialized commonly at random and close to zero by following either a uniform or Gaussian distribution. For regression problems, the mean square error (MSE) is commonly used as the objective function in the optimization.

During the optimization, a non-convex function is solved; this function contains multiple minima having a similar performance. The objective function with respect to the parameters is non-convex because of the several symmetric configurations of a neural network. Thus, exchanging the associated bias and weights of one neuron with another neuron in the same layer entails the same results. Furthermore, ANN applications usually use input variables that are related to each other. The interchangeability of dependent input variables results in multiple parameter sets having a similar performance. The number of dimensions in the optimization is equal to the number of parameters, and it is also argued by

The concept of stochastic gradient methods has been introduced to avoid saddle points. These methods are based on the ordinary steepest gradient method; however, rather than using the entire training data set (batch) at once, a number of data records (or “minibatches”) are taken for each iteration during the optimization. Consequently, the optimization surface is slightly altered at each iteration. This can even sometimes help to escape shallow local minima. For the interested reader,

The related studies of

There are several reasons that support the use of an ensemble rather than a single model. First, random parameter initialization can end up in different local minima on the parameter surface with similar performance. This situation is related to the concept of equifinality, introduced by

Finally, in regard to the architecture of the MLP, through the universal approximation theorem, it is proven that any continuous function can be approximated by a feed-forward neural network with a single hidden layer under mild assumptions. This theorem was first proven by

According to

In this section, we introduce a snow classification scheme proposed by

The snow classification system separates the world map into seven different snow classifications at a resolution of

Snow classes across Canada as defined by

Some sites close to the coast are classified as

The empirical cumulative distribution function (ECDF) of the variables snow depth, SWE, and density for each snow class.

The empirical cumulative distribution function (ECDF) of snow depth, SWE, and snow density for each snow class is depicted in Fig.

In this section we introduce two regression models from the literature, which we use as benchmarks to compare with our MLP-based conversion model, using the performance indicators described in Sect.

In this section, we introduce deterministic evaluation metrics (Sect.

Furthermore, an ensemble is said to be reliable if the relative frequency of the event, for a given simulation probability, is equal to the simulation probability.

Deterministic evaluation metrics quantify the performance based on the single outcome of a deterministic model and the observation. Here, the median of the ensemble is considered as a deterministic simulation on which these metrics can be applied. The most popular measures are the mean absolute error, the root mean square error, and the mean bias error.

The mean absolute error (MAE), root mean square error (RMSE), and mean bias error (MBE) are defined as

In Eqs. (

In the empirical case, i.e., without distribution fitting, we consider the sorted predicted ensemble with

Given that the ignorance score evaluates the simulated probability function only at the point of observation, no information about the area surrounding the observation or the shape of the probability function is included.
The continuous ranked probability score (CRPS) addresses this drawback by working directly on the cdf. The CRPS of one record is defined as

Small values of the reliability portion testify to the high reliability of the ensemble simulation. The reliability portion is highly related to the rank histogram, introduced in the next section (Sect.

The rank histogram was developed independently and almost simultaneously by

The reliability diagram is a visual tool for determining the reliability of an ensemble forecast. The relative frequency of the observation, given the probability of the ensemble, is plotted against the probability of the ensemble.

To construct a reliability diagram, we work with the empirical staircase function over the ensemble members as cdf. Let us assume that we have

Skill scores enable comparing simulated model outputs of different magnitudes. The skill score (SS) is defined as

A variation of climatology is taken as a reference simulation because SWE measurements are not continuous over time in the data set. Records from the training and validation data sets at the same location and within a time window of

Following

Environment and Climate Change Canada

Distribution of the records in the Canadian historical snow survey (CHSS) data set for

For this study, we use the above-mentioned CHSS snow data, collected from 1 January 1980 to 16 March 2017. The data set consists of 234 779 measurements from

Location of the sites of the Canadian historical snow survey (CHSS) data set.

We retrieved daily total precipitation and snow density, as well as the maximum and minimum temperatures for each site of the CHSS from the ERA5 atmospheric reanalyses of the European Centre for Medium-Range Weather Forecasts (ECMWF) provided by the

From snow depth, snow density, total precipitation, and temperature, we obtain the following explanatory variables. This initial pool of variables is based on

averaged daily snow density (ERA5);

snow depth;

number of days since the beginning of winter;

number of days without snow since the beginning of winter;

number of freeze–thaw cycles, with the threshold for freezing and thawing set at

the degree-day index, i.e., accumulation of positive daily temperatures since the beginning of the winter;

the snowpack aging index, i.e., the mean number of days since the last snowfall weighted by the total solid precipitation on the day of the snowfall;

the number of layers in the snowpack estimated from the timeline and intensity of solid precipitation, a new layer being considered to be created if there is a 3 d gap since the last snowfall;

accumulated solid precipitation since the beginning of the winter;

accumulated solid precipitation during the last

accumulated total precipitation during the last

mean average temperature during the last

We set the beginning of winter as 01 September because the seasonal distribution of the CHSS data set has the first snow records starting from mid-September, with the exception of some outliers. The separation of precipitation into solid and liquid parts is done by

Variables having the largest absolute Spearman correlation between the target variable and the last three explanatory variables for

Furthermore, we want to test the incorporation of input uncertainty on one variable, namely snow depth. According to the

The data set is divided randomly into three parts: training, validation, and testing sets

A summary of the reference MLP ensemble setup and the tests performed to obtain the final MLP architecture. SMLP refers to the single MLP ensemble model and MMLP to multiple MLP ensemble model; note that we track the performance of the model on the validation data set over a range of epochs until evaluation metrics show worse results to determine the correct training time.

The parameter initialization

The results are divided into three sections. In Sect.

First in Sect.

A comparison of the two target variables is performed by using the initial MLP architecture presented in Table

Comparison of

Rank histogram for different numbers of epochs

Comparison of all tested characteristics worth considering. The number of epochs is selected so that the trade-off between reliability and accuracy of the ensemble is increased. The numbers in bold indicate an improvement throughout the testing.

Parameter initialization

We apply input uncertainty because we cannot train for a sufficient enough period to obtain the best error scores measured by RMSE, MAE, and CRPS. This inability is due to a loss of reliability and overfitting. This type of overfitting is related to the ensemble and does not describe the overfitting of one single MLP. To better understand, we examine the RMSE in Fig.

Table

For the RMSProp algorithm, we set the global learning rate at 0.1, 0.01, 0.001, and 0.0001 during testing. The best result is produced for the trial having a learning rate of 0.0001. The learning rate of

In regard to the parameter initialization, the equation

Note that shuffling the data produces an almost identical result as that for the reference setup. However, it eliminates bias, which is almost zero beyond 15 epochs.

The results of the sensitivity scores for each explanatory variable are presented in Table

Ordered results of the sensitivity analysis from the least to most influential variable. The score is calculated following Sect.

Stepwise reduction of input variables, ordered according to their influence as determined in Table

Figure

First, we determine the number of hidden neurons and number of epochs for SMLP. Figure

Results using the Combo setup from Table

Rank histograms of the single MLP ensemble (SMLP) with 120 neurons in the hidden layer for different numbers of epochs, evaluated on the validation data set.

Optimal combination of the number of hidden neurons and number of epochs for each snow class within the multiple ensemble model. Optimal combination is determined analogous to the single ensemble model in Fig.

Second, we perform the same analysis for each snow class, to finalize the ensembles, one for each snow class for the MMLP model.
Table

Table

Final setup of the MLPs for the single MLP ensemble model (SMLP) and the multiple MLP ensemble model (MMLP); the ensembles in the MMLP model differ only in terms of their number of hidden neurons and number of epochs. As a comparison, the setup of the ANN ensemble proposed by

In this section, we use the testing data set and evaluate the performance of the SMLP and MMLP models. Table

Performance evaluation of the model with a single MLP ensemble covering Canada (SMLP) and the multiple MLP ensembles (MMLP), evaluated using the testing data set.

The scatter plot in panels

Figure

Distribution of the residuals of the simulated ensemble median and the observations. The boundary of the box shows the first and third quartiles; the caps at the end of the whiskers show the 5th and 95th percentiles.

Next, we apply skill scores to ensure a valid comparison between SWE estimates of differing magnitudes. The climatology of SWE from the CHSS data set is used as the reference simulation (see Sect.

Comparison of the performance of the single MLP (SMLP) and multiple MLP (MMLP) models using

The separation into different snow classes reveals that, as expected, the multiple MLP model eliminates bias in all snow classes, as shown in Fig.

Comparison of the rank histogram of

Furthermore, Fig.

The tundra snow and taiga snow classes have the lowest skill scores in terms of both accuracy and reliability. In particular, the reliability part of the CRPS in Fig.

The maritime snow class shows a slightly better accuracy than the mountain snow class. This difference may relate to the complexity of snow accumulation patterns in mountainous regions, owing to the high spatial and temporal variability of all physical processes and variables in these areas. Also the temperature correction, as explained in Sect.

In the next step, the testing data set is divided into elevation classes from

Furthermore, an analysis over the course of the year shows an improved accuracy for the MLP models compared with climatology for all months, except for July, August, and September. During these 3 months, there is generally very little snow across the country and, consequently, very little data. Additionally, the beginning of the winter, subjectively taken as 1 September, causes a reset to zero for several input variables; the variables of temperature, snow depth, and SWE data remain, however, within their usual ranges. This greatly complicates the proper training of the MLPs. Overall, except these three problematic months, the accuracy and reliability remain relatively constant throughout the year, with a slightly improved performance in spring and early summer compared with climatology.

The regression models are trained and validated using the same perturbed snow depth data sets that are used to optimize the MLP models. The testing data set is then used for comparison purposes in this section.

There is a possibility of producing an ensemble simulation by performing the deterministic regression models on perturbed snow depth and obtaining multiple members. However, given that the spread in the simulated ensemble of the MLP models is explained mainly by the various parameter initializations, this approach entails an ensemble that is too narrow when regression models are used. This leads to a comparison of models that has little meaning: the optimization of the Sturm model could be initialized with different parameter sets; however, the Jonas model has a perfect set of parameters. Therefore, we must compare the models using exclusively deterministic evaluation techniques.

In addition to the regression models, we also considered a simple benchmark model, which takes the average observed snow density in the training and validation data sets as a constant snow density and then calculates SWE for the testing data set.

The results are presented in Table

Comparison of the overall performance evaluation using deterministic performance evaluation metrics of the two MLP models and the regression models, evaluated using the testing data set.

Comparison of the performance of the two MLP and regression models by using

Figure

Scatter plot of median of the SWE ensembles and observed SWE for each model, evaluated on the testing data set; note that the axes are cut off at

Figure

This study tackles some important knowledge gaps regarding the conversion of snow depth to SWE, using ANN-based models. The main focus is on the architecture of the network, and two hypotheses are tested. The first hypothesis holds that using SWE rather than density as the target variable for the ANN will produce more accurate estimates of SWE. The second hypothesis states that in-depth testing of several ANN structural characteristics (e.g., optimization algorithm, activation function, parameter initialization, increasing the number of parameters) can improve the estimates of SWE. We thus investigate whether the ANN model must be trained specifically for different regions, as determined by snow climate classes

Our snow-depth-to-SWE model uses the inputs of snow depth, estimated snow density, and other explanatory variables derived from meteorological data. The available snow data includes snow depth, SWE, and snow density measurements from across Canada, collected over almost 40 years.

We then use an ensemble of multiple MLPs to address the issue of the random parameter initialization during optimization. The approach also provides a probabilistic estimate to gain greater insight into model performance. A trade-off between reliability and accuracy is used as a means of evaluation, which gives a more comprehensive analysis of SWE-estimation models.

Many previous models

In our investigation of model structures, we built two models. One model uses a single MLP ensemble for all of Canada. The second model trains one MLP ensemble for each snow class, as defined by

A sensitivity analysis reveals that a greater number of input variables increases the reliability of the ensemble. Therefore, adding more variables could further heighten the model's reliability. After proposing SWE as the new target variable in this study, short-term and long-term variables regarding precipitation with respect to SWE need to be analyzed. In Table

Regarding the limitations of this study, both models show poor performance for high SWE values, mainly because the amount of available training data is low for those extreme values. Furthermore, the model is not predictive and especially cannot account for the effect of climate change. It is noted that the models requires more data compared to the regression models proposed by

As mentioned in Sect.

Finally, this study shows an optimal performance of networks having large numbers of neurons in the hidden layer at amounts far above the commonly used rules of thumb. This provides a motivation to look into network structures having multiple layers.

The code including some testing data is available on GitHub (

KFFN performed all the computations, suggested most of the specific tests to be undertaken, and wrote the initial version of the manuscript. JO suggested edits to the manuscript and provided the initial version of the codes to KFFN, which KFFN then translated to Python and modified. The original idea for this work was from MAB, who also guided the work throughout and edited the manuscript significantly. CG was involved in the guidance for this project as a representative of ECCC and provided a final proofread of the manuscript before its submission.

The authors declare that they have no conflict of interest.

The authors acknowledge the financial support of Environment and Climate Change Canada and are also thankful to the Réseau Météorologique Coopératif du Québec for providing the data required for this study. Furthermore, the authors would like to thank Vincent Fortin and Vincent Vionnet for their contribution and input throughout the project.

This research has been supported by the Environment and Climate Change Canada (grant no. GCXE20M017).

This paper was edited by Ryan Teuling and reviewed by two anonymous referees.