Assimilation of citizen science data in snowpack modeling using a new 1 snow dataset : Community Snow Observations

Ryan L. Crumley1, David F. Hill2, Katreen Wikstrom Jones3, Gabriel J. Wolken3,4, Anthony A. 4 Arendt5, Christina M. Aragon1, Christopher Cosgrove6, Community Snow Observations Participants7 5 6 1Water Resources Science, Oregon State University, Corvallis, OR 97331, USA 7 2School of Civil and Construction Engineering, Oregon State University, Corvallis, OR 97331, USA 8 3Alaska Division of Geological and Geophysical Surveys, Fairbanks, AK 99709, USA 9 4International Arctic Research Center, University of Alaska Fairbanks, Fairbanks, AK 99775, USA 10 5University of Washington, Applied Physics Laboratory, WA 98105, USA 11 6Geography Department, Oregon State University, Corvallis, OR 97331, USA 12 7Citizen scientists participating in the project Community Snow Observations (CSO) 13 14 15 16 Correspondence to: Ryan L. Crumley (ryanlcrumley@gmail.com) 17 18 Abstract. 19 20 A physically-based snowpack evolution and redistribution model was used to test the effectiveness of assimilating crowd-sourced 21 measurements of snow depth by citizen scientists. The Community Snow Observations (CSO; communitysnowobs.org) project 22 gathers, stores, and distributes measurements of snow depth recorded by recreational users and snow professionals in high 23 mountain environments. These citizen science measurements are valuable since they come from terrain that is relatively under24 sampled and can offer in-situ snow information in locations where snow information is sparse or non-existent. The present study 25 investigates 1) the improvements to model performance when citizen science measurements are assimilated and 2) the number of 26 measurements necessary to obtain those improvements. Model performance is assessed by comparing time series of observed 27 (snow pillow) and modeled snow water equivalent values, by comparing spatially-distributed maps of observed (remotely sensed) 28 and modeled snow depth, and by comparing fieldwork results from within the study area. The results demonstrate that few citizen 29 science measurements are needed to obtain improvements in model performance and these improvements are found in 62% to 78% 30 of the ensemble simulations, depending on the model year. Model estimations of total water volume from a sub-region of the study 31 area also demonstrate improvements in accuracy after CSO measurements have been assimilated. These results suggest that even 32 modest measurement efforts by citizen scientists have the potential to improve efforts to model snowpack processes in high 33 mountain environments, with implications for water resource management and process-based snow modeling. 34 35

gathers, stores, and distributes measurements of snow depth recorded by recreational users and snow professionals in high 23 mountain environments. These citizen science measurements are valuable since they come from terrain that is relatively under-24 sampled and can offer in-situ snow information in locations where snow information is sparse or non-existent. The present study 25 investigates 1) the improvements to model performance when citizen science measurements are assimilated and 2) the number of 26 measurements necessary to obtain those improvements. Model performance is assessed by comparing time series of observed 27 (snow pillow) and modeled snow water equivalent values, by comparing spatially-distributed maps of observed (remotely sensed) 28 and modeled snow depth, and by comparing fieldwork results from within the study area. The results demonstrate that few citizen 29 science measurements are needed to obtain improvements in model performance and these improvements are found in 62% to 78% 30 of the ensemble simulations, depending on the model year. Model estimations of total water volume from a sub-region of the study 31 area also demonstrate improvements in accuracy after CSO measurements have been assimilated. These results suggest that even 32 spatial resolution of 0.667 degrees by 0.5 degrees, with a 3 hr temporal resolution beginning in 1979. MERRA2 replaces the older 244 version product with updated assimilation processes to include more weather datasets. 245 can be found at the SNOTEL website (https://www.wcc.nrcs.usda.gov/snow/) and Serreze et al. (1999). Direct links to the 254 SNOTEL websites for the UTS and SLS stations can also be found in the Data Availability section below. 255 256

LiDAR and Photogrammetry Derived Data 257
The airborne photogrammetry survey was conducted on April 29, 2017 with a Nikon D800 36.2 megapixel camera and flown on 258 a fixed-wing aircraft above a portion of the Thompson Pass study area, see Figure 3 for location and extent. An onboard Trimble 259 Global Navigation Satellite System (GNSS) and a base-station were used for positional control. Post-processing was completed 260 with structure-from-motion software to create a digital surface model (DSM) of the photogrammetry-derived snow surface. The 261 airborne LiDAR survey was collected on April 7th and 8th, 2018, using a Riegl VUX1-LR laser scanner flown on a fixed-wing 262 aircraft. An onboard integrated inertial measurement unit (IMU) and GNSS, and a base-station were used to provide positional 263 control for the LiDAR-derived snow DSM. Both RS datasets were evaluated against a previously collected photogrammetry-264 derived DSM from 2014 when no snow was present. An interpolation scheme was used to gap-fill some of the negative values in 265 the snow DSM due to vegetation cover effects. 266

Community Snow Observations Data 286
The CSO program collects snow depth data from citizen scientists in snowy environments worldwide. Full details including links 287 to smartphone apps and tutorials are found at http://communitysnowobs.org. Citizen scientists take several (2 to 4) snow depth 288 measurements within a small area (< 4 m2) using an avalanche probe or other depth measuring device (meterstick, etc.). These 289 measurements are then averaged by the participant and submitted using the app or program preferred by the participant. The 290 submitted data include the global positioning system (GPS) location in latitude and longitude, time and date, and snow depth 291 measurement (cm). The accuracy of the GPS system for each participants' mobile device determines the location error of the GPS, 292 with common errors for mobile phones ranging between +/-4 to 7 m (Garnett & Stewart, 2015;Schaefer & Woodyer, 2015). Since 293 the model resolution is 30 m and 100 m, this level of horizontal error in GPS location is acceptable for the purposes of our research 294 questions. All collected data are made freely available on the CSO website for visualization and download (see Section 9 for Data 295 Availability). Thousands    We performed model calibration using five years of the historical record of the UTS station from WY2012 through the end of 308 WY2016. The calibration was focused on adjustments to temperature lapse rates, precipitation lapse rates, wind adjustment factors, 309 and use of the SnowTran3d sub-model. We chose temperature lapse rates and precipitation lapse rates for calibration because 310 SnowModel is known to be limited by these factors when large elevational differences exist within the model domain (Liston and 311 Elder, 2006). We chose wind adjustment factors and the wind transportation sub-model for calibration because wind redistribution 312 of snow plays a significant role in the study area based on the 2018 fieldwork and the RS surveys from 2017 and 2018. Since the 313 SnowAssim sub-model requires a single layer snowpack, no adjustments were made to the snowpack layer structure. For each 314 weather reanalysis product a full calibration was performed for the 30m and 100m model resolutions, in the event that spatial 315 resolution plays a significant role in parameter selection. See Appendix A for the descriptions of the model parameters tested 316 during the calibration. 317

318
The daily SWE output from each calibration simulation is compared with the UTS observed SWE for the duration of the 5-year 319 calibration time period using root mean squared error (RMSE), the Nash Sutcliffe Efficiency (NSE), the Kling-Gupta Efficiency 320 (KGE), and mean bias error (Bias) to assess the calibration simulations.   Calibration results in Table 1 show that the 30m model grid resolution slightly outperforms the 100m model grid resolution in the 330 MERRA2-forced calibration simulations. However, the CFSv2-forced simulations show no difference between the model grid 331 resolutions. The CFSv2 product slightly outperforms the MERRA2 product in terms of SWE RMSE. Overall, the differences 332 between the top performing model grid resolution and reanalysis product are mixed and potentially negligible, varying by metric. 333 The NSE and KGE model performance metrics in the calibration simulations are lower than expected, due primarily to precipitation 334 inputs from the reanalysis products that were consistently higher than measured precipitation at the UTS station. Since SnowAssim 335 adjusts the precipitation fields during assimilation, these input deficiencies are acceptable for the purposes of this study. The 336 SnowModel default parameter values notably and consistently produce the top performing simulations, see Appendix B for details. 337 Due to each of these factors, the calibrated model for the remainder of the study uses the CFSv2 reanalysis product, the 30m model

Experimental Design 353
With the model calibrated, we carried out a series of simulations in order to (1) quantify the improvement in model performance 354 due to the assimilation of CSO measurements and to (2) understand the effects of the number of CSO data points selected for 355 assimilation. Model simulations without using CSO measurements provide a baseline for comparison, referred to as the NoAssim 356 case. Ensemble model simulations were also carried out with various numbers of CSO measurements assimilated, referred to as 357 the CSO simulation case. An ensemble of 60 trials per year were carried out with n = 1, n = 2, n = 4, n = 8, n = 16, and n = 32, 358 where n equals the number of CSO measurements assimilated per WY. In each instance (n value), 10 realizations of the numerical 359 experiment were carried out. 360

361
The timeframe of the assimilating CSO measurements was restricted to the peak SWE period or later. According to the UTS station, 362 peak SWE in the study area generally occurs mid-to late-April and consequently the earliest assimilation date was set to April 363 15th. The CSO measurements were aggregated by week because initial simulations suggested that daily increments were not 364 producing realistic results by SnowAssim. Additionally, CSO participation in the Thompson Pass region during the early 365 accumulation season was infrequent in WY2018 and non-existent in WY2017. Since peak SWE is important for mountain 366 hydrology and ecology, with many snow studies using it as an indicator metric, the time restrictions are acceptable for the research 367 questions addressed in this study (Bohr and  The following results reflect the three types of available validation datasets: 1) time-series SWE results at the UTS station, 2) 372 spatial snow depth distributions from the RS datasets, and 3) point-based snow depth and SWE measurements from the 2018 373 fieldwork. 374 375

Temporal Results Using the Upper Tsaina SNOTEL Station 376
The temporal results compare the UTS station SWE time-series to the ensemble member SWE time-series during WY2017 and 377 WY2018. Figure 5 displays the temporal cycle of snowpack accumulation and ablation, and the timing of peak SWE. At the UTS 378 station in the study area, the average WY day of peak SWE is 228, or April 15th. Before this day, the snowpack is generally 379 increasing in SWE and afterwards the snowpack generally enters the ablation period with a reduction in SWE. This temporal cycle 380 can be observed in Figure

422
The snow depth distribution maps in Figure 7 display the RS datasets (a,b), the results from the highest performing CSO simulation 423 (c,d), and the NoAssim case for each WY (e,f). Refer to Figure 2 for the RS dataset location within the study area. We present the 424 Best CSO simulation as the focus of Section 6.2 ranked according to KS score ranking ( Figure 6). A full accounting of each 425 ensemble member and their spatial distribution ranking can be found in Appendix E. In the RS datasets, there is more variation 426 and heterogeneity in snow depth across short distances (Figure 7a-b). This spatial diversity is evident even after the RS dataset has 427 been aggregated to correspond to the model resolution at 30 m, as depicted in Figure 7. The NoAssim case and Best CSO simulation 428 show less spatial diversity, and the NoAssim case broadly overstimates snow depth when compared to the Best CSO simulation 429 for both WYs. The visualization of the snow depth distributions in Figure 7 illustrate the challenges of accurately representing the 430 process scale through physics-based modeling at low resolutions (Blöschl 1999), and some of these challenges will be examined 431 further in the discussion section.    WY2017 (a,b) and WY2018 (c,d).

454
The multimodal distribution of snow depths in the modeled results can be explained by their relationship to the elevation of the 455 surrounding terrain. The input DEM and the snow depth distributions were compared on a grid-cell-to-grid-cell basis using a two-456 dimensional histogram (2DH). Figure 9 is a series of 2DHs that display snow depth (x axes) versus the input DEM (y axes) in the 457 RS area from both years. Darker colors indicate a higher frequency of snow depth and elevation values corresponding to each 458 dataset. The 2DHs show a proportional relationship between the modeled snow depths (Figure 9 a, Figure 10 shows the co-located SWE depth measurements (y axes) versus the snow depth measurements (x axes) from 474 each site aggregated by month. The bars in Figure 10 represent the variability in snow depth within the surrounding 100m2 of the 475 SWE measurement, including the average, minimum, and maximum of 8 snow depth measurements at each site. submission, we separated those measurement sites used in the assimilation scheme from the validation set when creating Table 3.

Spatially Averaged Snow Water Equivalent Results 495
Another way to quantify the ability of CSO measurements to constrain SnowModel output is to investigate the modeled SWE 496 averaged over a large area. Table 4 contains the spatially averaged SWE estimations from the RS survey area in WY2018, and 497 includes the RS dataset, the Best CSO simulation, and the NoAssim case. We focus on WY2018 because the fieldwork 498 measurements include estimated bulk density values at each measurement site. These bulk density estimations were measured 499 during April 2018 and were partitioned from the larger dataset and spatially averaged over the RS region only (n=22). The 500 fieldwork estimated bulk density value was then applied to the spatially averaged RS snow depth. For the Best CSO simulation 501 and the NoAssim case, the spatially averaged snow depth, SWE, and snow density values were taken directly from the model 502 https://doi.org/10.5194/hess-2020-321 Preprint. Discussion started: 15 September 2020 c Author(s) 2020. CC BY 4.0 License. Table 4 demonstrate that SnowAssim can constrain the SWE output over a large region 503 based on a few, randomly chosen CSO measurements. Importantly, the accuracy of the total modeled water volume from the RS 504 region in 2018 improves when CSO measurements are included, a key finding that has implications for water resource management 505 decisions in snowy, data-limited, mountain environments. 506 507

Precipitation Adjustment Experiment 513
The experimental design of the present study was developed for remote locations where a long-term precipitation dataset was not 514 available to bias correct the precipitation inputs. However, since a long-term precipitation dataset may be available in other 515 locations, we decided to test the results with a precipitation experiment. In this experiment we applied a scalar to the CFSv2 516 precipitation fields for bias correction and all other model parameters and input datasets were held constant. The experiment results 517 show that some of the CSO ensemble simulations still outperformed the NoAssim case with the precipitation adjustment, both 518 spatially and temporally. For example, the spatial results show that 43% percent of the ensemble runs in WY2017 and 20% of the 519 ensemble runs in WY2018 outperformed the NoAssim case when the precipitation was bias corrected, according to their KS score 520 ( Figure 11). Similarly, the temporal results show that 42% of the ensemble runs in WY2017 and 58% of the ensemble runs in 521 WY2018 outperformed the NoAssim case when the precipitation was bias corrected, according to their KGE score. The ECDF 522 and histogram analysis from the precipitation adjustment factor experiment also show model improvements when there was broad 523 underestimation of snow depths in the NoAssim case in WY2017 and broad overestimation in WY2018. These results demonstrate 524 that using CSO measurements for assimilation can improve model performance when the available weather forcing dataset has 525 known biases (no precipitation adjustment factor case) but when those biases have been decreased (precipitation adjustment factor 526 case) the improvements become less clear, they vary from year to year, and are less consistent between spatial and temporal results. One of the limitations of the present study is that the physical and temporal characteristics of the CSO measurements like aspect, 577 elevation, and early-season measurements were not fully tested. Initial simulations demonstrated that SnowAssim performs best 578 when the assimilated measurements were located close in time to the validation dataset. This factor influenced our choice to focus 579 on the late-season time period of CSO measurements since the RS surveys were conducted in the late-season. Additionally, since 580 the majority of the CSO measurements for both WYs occurred between March 15th and May 15th, future research should be in a 581 location where CSO measurements are obtained frequently throughout the accumulation season. A research project with many 582 measurements throughout the accumulation period may provide more insights into the temporal aspects of assimilation of CSO 583 measurements. We decided not to subset the CSO measurements by geophysical characteristics like aspect, elevation, and land 584 cover type because these require additional analysis that is outside of the scope of the current study. Understanding the effects of 585 temporal and spatial restrictions of CSO measurements on model performance will likely be an area of future research. 586 Additionally, it may be necessary to test other process models and alternate assimilation schemes in the future to improve the 587 spatial distribution of model results and determine if CSO measurements can be used in other modeling contexts. 588 were carried out during the 2017 and 2018 snow seasons to investigate the effects of incorporating citizen science measurements 593 into the model chain using an assimilation scheme. Time series SNOTEL station records, remotely sensed photogrammetry and 594 light detection and ranging surveys, and fieldwork observations are used to validate the modeled snow depth and snow water 595 equivalent distributions. Any number of CSO measurements assimilated improves model performance, from 1 to 32. Our results 596 demonstrate that using CSO measurements for assimilation can improve model performance when the available weather forcing 597 dataset has known biases and also when those biases have been decreased by using a precipitation adjustment factor. The 598 improvements in model performance from CSO measurements occur in 62% to 78% of the ensemble simulations both spatially 599 and temporally, and in cases when the model broadly overestimates or underestimates snow depth and SWE. Model estimations 600 of total water volume from a sub-region of the study area also demonstrate improvements in accuracy after CSO measurements 601 have been assimilated. This study has implications for water resource management and snow modeling in locations where in-situ 602 snow information is limited but snow enthusiasts often visit, since even small numbers of assimilated CSO    The datasets used in this study can be found at the following locations. 636 code written for this project is available at https://github.com/snowmodel-tools/preprocess_javascript (last accessed: 30 652 April 2020). 653 654 6.
To convert the CFSv2 data downloaded from Google Earth Engine to the necessary input file for MicroMet we 655 wrote Matlab scripts that can be downloaded via Github at https://github.com/snowmodel-tools/preprocess_matlab (last 656 accessed: 30 April 2020).