Advances in Soil Moisture Retrieval from Multispectral Remote Sensing Using Unmanned Aircraft Systems and Machine Learning Techniques

We developed machine learning models to retrieve surface soil moisture (0 – 4 cm) from high-resolution multispectral imagery using terrain attributes and local climate covariates. Using a small unmanned aircraft system (UAS) equipped with a multispectral sensor we captured high-resolution imagery in part to create a high-resolution digital elevation model (DEM) as well as quantify relative vegetation photosynthetic status. We tested four different machine learning algorithms. The boosted regression tree algorithm gave the best prediction with mean absolute error of 3.8 % volumetric water 15 content. The most important variables for the prediction of soil moisture were precipitation, reflectance in the red wavelengths, potential evapotranspiration, and topographic position indices (TPI). Our results demonstrate that the dynamics of soil water status across heterogeneous terrain may be adequately described and predicted by UAS remote sensing data and machine learning. Our modeling approach and the variable importance and relationships we have assessed in this study should be useful for management and environmental modeling tasks where spatially explicit soil moisture information is important. 20 List of Acronyms ALE Accumulated local effects ANN Artificial neural network BRT Boosted regression trees DEM Digital elevation model 25 MAE Mean absolute error MBE Mean bias error NDVI Normalized difference vegetation index NIR Near-infrared PET Potential evapotranspiration 30 RF Random forest RMSE Root mean square error RVR Relevance vector regression SVR Support vector regression https://doi.org/10.5194/hess-2020-271 Preprint. Discussion started: 18 August 2020 c © Author(s) 2020. CC BY 4.0 License.


(Ben
. Soil reflectance in the visible to shortwave infrared spectral region generally decreases with an increase in soil moisture, with some parts of the spectrum showing a more pronounced decrease than others 95 (Haubrock et al., 2008;Weidong et al., 2002). The hydroxide bond is the strongest absorber in the near-infrared region and free water in soil pores has strong absorption around 1.4 and 1.9 µm wavebands (Malley et al., 2004). Several hyperspectral techniques to estimate soil moisture content have been developed such as the Soil Moisture Gaussian Model (SMGM) (Whiting et al., 2004) and the Normalized Soil Moisture Index (NSMI) (Haubrock et al., 2008).
In the presence of vegetation cover, however, the ability to use soil reflectance to measure soil moisture is limited (Muller and 100 Décamps, 2000). In addition, soil reflectance of solar radiation represents only the upper 50 µm of soil, and this makes it challenging to estimate moisture conditions in deeper layers (Malley et al., 2004). Most soil moisture remote sensing approaches operating in the optical range relay on developing an empirical spectral vegetation index (Barrett and Petropoulos, 2014). Several soil moisture measurement methods based on vegetation index proxies have been suggested as vegetation indexes are extremely sensitive to water stress, and they allow indirect estimates of soil moisture (Zhang and Zhou, 2016). 105 Many studies have focused on deriving surface soil moisture content from synergistic use of remote sensing data acquired simultaneously in the optical and thermal infrared spectrum. The so-called 'universal triangular relationship' is a widely used method for estimating soil moisture (Nichols et al., 2011;Sobrino et al., 2014).
The advantage of empirical relationships is that they are typically fast to derive and do not require too many inputs (Ali et al., 2015). The disadvantages of empirical models are the need for higher quality ground measurement, which could be time 110 consuming and expensive, and that the derived relationship is typically site and sensor dependent which limits the possibility to extend their use in a different area readily.
Some disadvantages specific to remote sensing methods in the optical and thermal infrared spectrum are the fact that these wavelengths have shallow soil penetration and require cloud-free conditions. Many of the optical and thermal infrared synergistic approaches require a wide range of both vegetation index and soil moisture conditions within a study region which 115 cannot always be satisfied (Barrett and Petropoulos, 2014).
The advantages of machine learning techniques in remote sensing are their ability to learn and approximate complex nonlinear mappings and the fact that no assumptions need to be made about data distribution. They can thus integrate data from different sources with poorly-defined or unknown probability density functions (Ali et al., 2015). Machine learning techniques have often been shown to outperform other parametric approaches (Ali et al., 2015;Paloscia et al., 2008). Furthermore, 120 machine learning techniques improve with an increasing number of observed datasets. Some of the limitations of machine learning methods are the need for a large number of training data, which require extensive ground truth datasets, and that machine learning methods are black boxes and only limited inference can be made about the relationships of different inputs.

5
Remote sensing from unmanned aircraft systems (UAS) has the potential to address several limitations of traditional remote sensing. The most attractive feature of UASs is their high spatial resolution, frequent or on-demand image acquisition, and 125 low operating costs (Anderson and Gaston, 2013;Berni et al., 2009;Colomina and Molina, 2014;Elarab, 2016). UAS is an umbrella term that refers to the unmanned aircraft and the complementary ground control and communication systems necessary for air surveys (Singh and Frazier, 2018).

Objectives
The purpose of this research was to advance soil moisture change measurement, process understanding, and prediction using 130 remote sensing products from UAS and machine learning methods. In this study, the spatial and temporal scale limitations were addressed by deploying multispectral remote sensing with small UAS and address the challenge of retrieving surface soil moisture changes using machine learning methods and fusing remote sensing data with ground data and meteorological data.
The specific goals of this study were to: (1) develop an adaptable method to retrieve information on surface soil moisture from small UAS remote sensing products and machine learning methods, (2) identify important reflectance and surface 135 characteristics for the prediction of soil moisture changes (3) identify appropriate spatial resolutions of reflectance images and terrain variables for estimating soil moisture, and (4) explore the relation of soil moisture to surface properties.

Background on The Machine Learning Algorithms Used
Several machine learning algorithms exist for multivariate regression modeling. Artificial neural networks (ANN) are among the most commonly used algorithms for the retrieval of soil moisture from remote sensing (e.g., Hassan-Esfahani et al., 2015;140 Paloscia et al., 2008). In recent years, the support vector machine (SVM) and the similar support vector regression (SVR) algorithms have become popular in the retrieval of soil moisture (e.g., Ahmad, Kalra, & Stephen, 2010;Zaman & Mckee, 2014;Zaman, McKee, & Neale, 2012). Other popular machine learning algorithms include tree-based models such as the random forest (RF) and boosted regression trees (BRT).

Artificial Neural Network (ANN) 145
ANN models have been widely used in the development of PTFs (Matei et al., 2017;Pachepsky et al., 1996;Schaap et al., 2001;Zhang et al., 2018;Zhang and Schaap, 2017). ANNs are universal approximators that can approximate any nonlinear mapping. The feed-forward neural network is a popular variant of ANN. In this study, we implemented the feed-forward neural networks with a single hidden layer.

Support Vector Regression (SVR) 150
SVR is an adaptation of the support vector machine (SVM) for regression problems (Cortes and Vapnik, 1995;Drucker et al., 1997). The SVM learning is a generalization of 'maximal margin classifier.' The algorithm first maps the input variables into https://doi.org/10.5194/hess-2020-271 Preprint. Discussion started: 18 August 2020 c Author(s) 2020. CC BY 4.0 License. a high-dimensional space using a fixed mapping function-a kernel function. The algorithm then constructs hyperplanes, which can be used for classification or, in the case of SVR, for regression. In this study, we use the Radial Basis Function kernel, which is one of the most commonly used kernels in SVR. Some advantages of SVR include the fact that they do not 155 suffer from the problem of local minima, and that they have few parameters to tune when training the model.

Relevance Vector Regression (RVR)
Like SVM, the RVR originally introduced as a classification machine (Tipping, 2000). RVR is a Bayesian treatment of the SVM prediction function which avoids some of the limitations of SVM algorithms, such as reducing the use of basis functions and the need for optimizing the cost and the insensitivity parameters (Ben-Shimon and Shmilovici, 2006). Torres-Rua et al. 160 (2016) successfully used the RVR algorithm to estimate surface soil moisture from satellite image and energy balance products.

Random Forest (RF)
RF are popular models that are relatively simple to train and tune (Hastie et al., 2009). They apply ensemble techniques by averaging a large number of individual decision tree-based models. Tree models are 'grown' by searching for a predictor that ensures the best split that results in the smallest model error. The individual trees in RF ensemble are built on bootstrapped 165 training sample, and only a small group of predictor variables are considered at each split, this ensures that trees are decorrelated with each other (Breiman, 2001;James et al., 2013).

Boosted Regression Trees (BRT)
BRT is another form decision tree model ensemble enhanced by the gradient boosting approach. The gradient boosting algorithm constructs additive regression models by sequentially fitting 'simple base learner' functions (i.e., decision trees) to 170 current pseudo-residuals at each iteration (Friedman, 2002). These pseudo-residuals are the gradient of the loss function being minimized. BRT models have shown considerable success and often outperform other machine learning algorithms in many situations (Elith et al., 2008;Natekin and Knoll, 2013). BRT models are also particularly adept for less-than-clean data (Friedman, 2001), which makes them particularly attractive in our work where the training data is compiled from various sources and different measurement methods which makes it prone to some inconsistencies. 175 Tree-based models, both the RF and BRT, have the advantage of being able to rank predictor variable's relative importance.
In these models, the approximate relative influence of a single predictor variable is calculated as the empirical improvement of predictions by splitting on that predictor at each node and then averaging the relative influence of the variable across all trees of the model (Ridgeway, 2012).

Methods 180
Multispectral images of the study area were collected on six different days throughout the 2018 water year using a UAS equipped with a multispectral camera. High-resolution digital elevation model (DEM) was generated from the stereo images using photogrammetric software, and multiple sets of terrain variables were calculated. Concurrently with the image acquisition flights, moisture content of top 3.8 cm of soil was measured at predefined sampling locations. The ground soil moisture measurements, multispectral reflectance, terrain variables, and rainfall and potential evapotranspiration (PET) data 185 were then aggregated into a data table and used to train a machine learning model to predict the soil moisture. Figure 1 shows the model building process.

Study Site
The study was conducted in a small grassland catchment at the Merced Vernal Pools and Grassland Reserve located about five kilometers northeast of the City of Merced, California. Located at the Central Valley of California, the study site has a Mediterranean climate with hot, dry summers and cool, wet winters with an average annual precipitation of 330 mm.
The Merced Vernal Pools and Grassland reserve covers an area of about 26.6 km 2 and protects hundreds of ephemeral pools 195 and wetlands (Wong, 2014). The reserve was historically and still is used for livestock grazing.
Our study site is a 0.6 km 2 area of land located within a sub-catchment that contributes to the Avocet Pond, a large stock pond located in the northeast corner of the Reserve (Figure 2). The catchment was selected because of an extensive hydrologic https://doi.org/10.5194/hess-2020-271 Preprint. Discussion started: 18 August 2020 c Author(s) 2020. CC BY 4.0 License. 8 modeling study that was being conducted on the site at the time (Fryjoff-Hung, 2018). We studied a small subset of an area investigated by Fryjoff-Hung (2018) to ensure similar land properties represented by our ground sampling. 200 The study area soils are dominated by Redding gravelly loam (Fine, mixed, active, thermic Abruptic Durixeralfs) soils. The 205 elevation of the study area ranges from 118 to 162 m above sea level, and slope ranges from 0 to 31°. The distribution of four topographic variables-elevation, slope, flow accumulation area, and topographic position index-within the study site is given in Figure S1.
The fact that vernal pool ecology is predominantly controlled by large seasonal shifts and high spatial variability in hydrology make the study site particularly attractive to the proposed research. UAS have the potential to provide information at 210 appropriate spatial and temporal scales for vernal pool studies (Stark et al., 2015). Knowing soil moisture dynamics at a higher frequency is important, especially during the seasonal transition times. The annual seasonal cycle of the study site is shown in

Data Collection
The imagery was acquired on six days during the 2018 water year green-up and brown-down using a fixed-wing UAS with a multispectral camera onboard (See Table 3). Figure 4 shows a typical scene of the study site during wet and dry seasons. Point soil moisture measurements (top 4 cm) were collected with a time-domain reflectometry (TDR) probe across precise sampling 220 transects identified with real-time kinematic (RTK) positioning survey. Daily rainfall and PET values were acquired from nearby weather stations.

Image acquisition and processing
The UAS remote sensing flights were conducted in the late mornings to mid-day during clear weather conditions. A single remote sensing campaign takes approximately three to four hours, meaning images were acquired between approximately 10:00 AM and 2:00 PM.
Multispectral images were acquired using Parrot Sequoia Sensor (Parrot SA, Paris, France) equipped with a sunshine sensor 230 that measured irradiance at the sensor spectral wavebands for radiometric normalization. The camera is deployed on a fixedwing unmanned aircraft (Finwing Sabre, Finwing Technology) with an average flight height of 120 m above ground level.
Images of a calibrated reflectance panel (MicaSense, Inc, Seattle, WA) were taken before each flight and used in the radiometric calibration of images.
The Parrot Sequoia sensor captures four separable bands in the green, red, red edge, and near-infrared bands with a focal length 235 of 3.98 mm and resolution of 1280x960 pixels. A fifth channel captures a high-resolution image in the visible spectrum with a focal length of 4.88 mm and a resolution of 4608x3456 (Pix4D, n.d.). About 12,000 images are captured per flight with a ground pixel resolution of approximately 10 to 15 cm. Images are mosaicked, orthorectified, and radiometrically calibrated using Pix4D photogrammetry software (Pix4D, Lausanne, Switzerland). The Pix4D software also generates DEM photogrammetrically from stereo-images. 240

In situ soil moisture measurement
The moisture content of the top 4 cm soil was measured simultaneously with UAS remote sensing flights using FieldScout TDR-300 soil moisture meter equipped with a 3.8 cm probe (Spectrum Technologies Inc., IL, USA). The FieldScout TDR-300 measures volumetric water content using time-domain reflectometry with a resolution of 0.1% and an accuracy of ±3% (Spectrum Technologies Inc., 2009). 245 We identified six, 90 m long transects within the watershed as sampling transects. To ensure that the sampling transects run over a variety of topography and do not fall within a topographically homogenous area, we generated hydraulic terrain variables using the DEM prior to the selection of transect locations. We generated sub-basin boundaries, flow accumulation, topographic wetness index, and stream network maps. We laid out the sampling transects in a way that they traversed across multiple values in terms of topographic wetness index and flow accumulation and ensuring that they cross, and not follow, a stream network. 250 We placed the six transect so that each fell in a separate sub-basin within the Avocet basin. Three of the transects fall in separate sub-basins that feed into the Avocet pond, and the remaining three are in sub-basins below the Avocet pond.
Once we decided on the location of the sampling transects the location of the two ends was recorded accurately using RTK positioning survey and marked with a metal peg hammered into the ground to allow for repeated measurement at the same location. During the soil moisture measurement campaign, we temporarily affixed a 90 m tape measure at the two ends of the 255 transect and took soil moisture measurement about every 10 m noting the exact distance of the sampling point from the transect ends.

Meteorological variables
Daily precipitation data was retrieved from the UC Merced weather station located approximately six km southwest of the study site (California Department of Water Resources, 2018). Daily reference evapotranspiration data was retrieved from the 260 California Irrigation Management Information System's Merced station located approximately 10 km south of the study site https://doi.org/10.5194/hess-2020-271 Preprint. Discussion started: 18 August 2020 c Author(s) 2020. CC BY 4.0 License.
(California Irrigation Management Information System, 2018). The Reference evapotranspiration is evapotranspiration from standardized grass calculated using the modified Penman (CIMIS Penman) and the Penman-Monteith equations (California Irrigation Management Information System, n.d.). The reference evapotranspiration is considered as PET in this study.

Data Processing 265
To prepare the data for machine learning. We compiled all the information into a table with the measured soil moisture from each sampling point and date organized into one column. Each row contained the accompanying information for that sampling point and time.

Geometric Correction 270
Pix4D software was used to process the images. Between 7 to 9 ground control point targets (GCPs) with precise locations identified by RTK survey were used for photo alignment. The mean georeferencing root-mean-square-errors (RMSE) of the GCPs ranged from 0.6 -2 cm, and mean reprojection errors ranged from 0.1 -0.2 pixel based on the bundle block adjustment error assessment report. DEM was generated using the structure-from-motion technique; noise filtering and mild surface smoothing (sharp smoothing) were applied to correct for noisy and erroneous points of the point cloud. The inverse distance 275 weighting algorithm was used to interpolate between points to create the raster DEM.

Radiometric correction
Along with capturing images in the four spectral bands, the multispectral camera records the location, orientation, and solar irradiation using its GPS, inertial measurement unit (IMU), and sunshine sensor, respectively. Radiometric calibration by the Pix4D software considers the positional data, solar irradiance measurements, and gain and exposure data from the camera to 280 convert raw digital numbers into sensor reflectance values. Sensor reflectance represents the ratio of the reflected light to the incoming solar radiation and provides a standardized measure that is directly comparable between images. Finally, surface reflectance is calculated in post-processing, taking into account the camera's orientation, the angle of the sun, and the known reflectance values of the calibration panel.

Feature engineering 285
We calculated several variables based on the multispectral reflectance, terrain, and meteorological data to be used to train a machine learning model as predictor variables. A list of all the measured and calculated variables used in modeling soil moisture are given in Table 1.

Reflectance based vegetation index
We calculated the Thiam's Transformed vegetation index (TTVI) based on the red and near-infrared bands (Equation 1). 290 The TTVI is a transformation of the commonly used Normalized Difference vegetation index (NDVI). The reason for choosing of TTVI over NDVI is that it eliminates negative values and transforms NDVI histograms into a normal distribution.

Terrain variables
A list of the calculated topographic variables and their description is given in Table 1. Topographic variables derived from 295 DEM are scale-dependent, to account for this, we calculated all topographic variables on six different resolution DEM. For this, we first upscaled the DEM from the original resolution of 6.85 cm to 15, 30, 60, 100, 300, and 500 cm cell resolution.
We then calculated topographic variables on all the resolutions.
The calculation of topographic position index (TPI) does not only depend on DEM resolution but on the definition of inner and out radii of the annulus (see Equation 2). 300 We calculated TPI for different neighborhood sizes using the ArcGIS 10.5 Land Facet Corridor Tool (Jenness et al., 2013).
We calculated TPI on three DEM resolutions (100, 300, and 500 cm) with two inner radii (1 and 3 cells) and three outer radii (3, 5, and 7 cells). The raster images of selected topographic variables are shown in Figure S2.

Meteorological variables
Precipitation and PET are two important drivers of surface soil moisture. We used the cumulative water year precipitation and PET. We also calculated rolling sums of those variables with different time windows before the measurement, we calculated 1-, 2-, 3-, 7-, 15, and 30-day cumulative precipitation and PET before sampling dates and used those rolling sums as input.

Data aggregation
We had a total of 406 soil moisture measurements across the six different measurement times. For each soil moisture, we extracted the raster values of reflectance and terrain variable by using raster to point data extraction tool in ArcGIS Pro using a 100 cm radius means. For this, ground coordinates of sampling locations were overlaid onto the geo-referenced image, and the pixel values across the bands representing the center of each sampled area were extracted to result in tabular data. We also 320 included the meteorological variables for each soil moisture reading.

Variable transformation
Prior to model training, all the predictor variables were standardized by centering the training variable's mean to zero and scaling by the variable's standard deviation, as shown in Equation (3): where ′ is the centered and scaled value of variable and; ̅ and are the arithmetic mean and standard deviation of the variable.

Variable selection
Variable selection (or feature selections) involves the selection of a subset of relevant variables (features) from a larger set of potential predictors. The benefits of variable selection include improvement of model performance, reducing training and 330 utilization times, and facilitating data understanding (Guyon and Elisseeff, 2003;Weston et al., 2003). We employed three methods of variable selection: tests of linear correlation and linear dependencies among variables, and recursive feature elimination. Recursive feature elimination involves removing the least important features whose omission has the least effect on training errors (Chen and Jeong, 2007;Guyon et al., 2002). We implemented recursive feature elimination procedure during the coarse tuning of BRT, RF, ANN, and SVR algorithm models. 335 The data preparation resulted in 138 variables. Of these, 76 variables were removed based on linear correlation and linear dependencies among variables. An additional 16 were removed following the recursive feature elimination procedure. The final data used for building the models had 46 variables (Table 2), of which five are meteoric, nine are reflectance variables, and 32 are topographic variables. Variable categories that had no importance included the topographic wetness index (TWI), the reflectance in the red-edge band, and NDVI. 340 https://doi.org/10.5194/hess-2020-271 Preprint. Discussion started: 18 August 2020 c Author(s) 2020. CC BY 4.0 License.

* Scale for raster products is pixel resolution in meters, and cumulative days for the meteoric variables. Topographic Position
Index scale is a combination of the inner-outer diameters in meters.

Data Description
The data collection days and summary site statistics are given in Table 3. Measurements were done on six different days spread throughout the 2018 water year. The cumulative precipitation and PET for the 2018 water year are shown in Figure S3 and 1 to 30-day rolling sums of precipitation and PET for the sampling dates are shown in Figure S4. The soil moisture measurement followed the general precipitation patterns but was also influenced by immediate rainfall events; the highest soil moisture occurred on the only measurement day where it had rained the day before (January 23, 2018). 355 The vegetation greenness, as measured with TTVI followed the 15-day cumulative rainfall well with maximum greenness occurring on April 4, 2018, and sharply decreasing the following two months ( Figure 5).

Machine Learning Procedure
The overall machine learning procedure is illustrated in Figure 1. The computationally demanding steps of model training and testing were run at the Multi-Environment Research Computer for Exploration and Discovery (MERCED) high-performance computing cluster, at the University of California, Merced. The caret R package (Kuhn, 2017) was used to handle training and 370 tuning procedures. The SVR and RVR algorithms were implemented using the kernlab package (Karatzoglou et al., 2004), RF algorithm was implemented using the randomForest package (Liaw and Wiener, 2015), and the BRT algorithm using the xgboost package (Chen and Guestrin, 2016).

Training-testing set splits
The data was split into training and testing sets of approximately 75-25 percent, respectively (i.e., 300 and 100 records). The 375 testing set was a hold-out set used only to evaluate final trained models.
For the testing set, two transects are randomly selected on four randomly selected sampling dates, and one transect is randomly selected on the remaining two sampling dates. To minimize bias that may result from the training-testing set split, we generated 30 unique training-testing set splits and trained 30 separate models based on each separate training set. The performance of each model was assessed on its respective testing set. Similar performance of the individual models would indicate that bias 380 due to the training-testing set split is minimal. The justification for this subsetting procedure is: (1) the selection of entire transects as testing sets avoids the possible data leakage between the training and testing sets due to spatial autocorrelation since samples in a transect are located close to each other-a simple random splitting would not avoid this potential problem; (2) all six sampling dates are represented in the training set-models are trained on the entire range of time and soil moisture changes; and (3) the testing set is between 25 to 30 percent of the data (between 100 to 125 samples). 385 The distribution of samples across the sampling dates and transects for the training and testing sets are shown in, Figures S7 and S8, respectively.
On average the training-testing split was 294 samples in the training sets and 113 samples in the testing sets. All the training sets have samples from all the six sampling dates and transects. While all sampling dates are represented in each testing set, on average, there are five transects in each testing set. 390

Cross-validation procedure
The selection of optimal model parameters in the model training process was done by the cross-validation method. Crossvalidation is done to estimate the test error rate by holding out a subset of the training data (i.e., validation set) from the fitting process and then applying the fitted model to predict the validation subset. A 30-fold cross-validation set was generated by https://doi.org/10.5194/hess-2020-271 Preprint. Discussion started: 18 August 2020 c Author(s) 2020. CC BY 4.0 License. randomly splitting the training data into 80-20 percent training-validation split by randomly selecting a single transect every 395 day. Optimum model parameters were selected using a comprehensive grid search method.

Performance
The final performance of models was assessed on the separate hold-out test dataset that was not used in the model training.
The performance of models is measured in terms of mean absolute error (MAE), mean bias error (MBE) and the coefficient 400 of determination ( 2 ) determined as follows: Where is the number of observations; is the measured value; ̂ is the predicted value; and ̅ is the mean of measured 405 values.
The MAE indicates the average deviation of predictions from the measured value with smaller values indicating better performance. The MBE measures the average systematic bias, positive or negative values indicate the average tendency of the predicted values to be larger or smaller than the measured values, respectively. The R 2 measures the correspondence between predicted and measured data with higher values indicating stronger correspondence. The MAE was chosen over RMSE as it 410 is a more appropriate measure when averaging (Willmott and Matsuura, 2005).

Variable importance
The predictor variable importance is the statistical significance of each predictor variable with respect to its effect on the generated model. For the tree-based models, RF and BRT, variable importance is calculated internally within the model algorithm (Equation 1). For the rest of the machine learning models, we calculated the predictor variable importance by 415 recursive feature elimination method, which is done by recursively removing predictors before training a model and evaluating the change in model performance. In this method, to account for possible bias in variable subset selection (Ambroise and https://doi.org/10.5194/hess-2020-271 Preprint. Discussion started: 18 August 2020 c Author(s) 2020. CC BY 4.0 License. , 2002;Hastie et al., 2009), we included a separate layer of 10-fold cross-validation to the entire sequence modeling steps.

Effect of predictor variables 420
The relationship between the predictor variables and outputs for a black-box model can be analyzed using model-independent methods such as partial dependence plots or accumulated local effects (ALE) plots (Apley, 2016;Greenwell, 2017). These plots help explain the relationship between the outcome of black-box supervised machine learning models and predictors of interest. We use the ALE plots to analyze the effect of selected predictor variables. Although similar, the ALE plots are preferred over partial dependence plots for their speed and their ability to produce unbiased plots when variables are correlated 425 (Apley, 2016). The value of the ALE is centered so that the mean effect is zero; it can be interpreted as the effect of the variable on the outcome at a certain value compared to the average prediction of the data. For example, an ALE estimate of -2 when a variable of interest has value 3, then the prediction is lower by 2 compared to the average prediction (Molnar, 2019).

Software Used
Preliminary UAS image processing-radiometric calibrations and, orthomosaic and DEM generation-is done using Pix4D 430 photogrammetry software (Pix4D, Lausanne, Switzerland). Raster rescaling, terrain analysis, and spatial data visualization are done in ArcGIS Pro software (ESRI, Redlands, CA, USA). The machine learning process-data preparation, model tuning, and prediction-were done in R (R Core Team, 2019).

Model Performance 435
Performances of the five machine learning algorithms we tested are presented in Figure 6. The BRT and RF algorithms had the best performance with MAE of less than 4 % volumetric water content. The relatively high accuracy of BRT and RF models is consistent with other studies that find ensemble decision-tree-based regression models perform better than many other machine learning algorithms (Caruana and Niculescu-Mizil, 2006); particularly in terrain and soil spatial predictions (Hengl et al., 2017(Hengl et al., , 2018Keskin et al., 2019;Nussbaum et al., 2018;Szabó et al., 2019). The RF model is much easier and faster to 440 train compared to the other machine learning algorithms used. Since the ensemble trees are independent in RF model, the 'forest' can be grown simultaneously, which dramatically increases processing efficiency in parallel computing. In addition, the RF model has few hyperparameters to tune. In contrast, the ensemble trees in the BRT algorithm must be grown sequentially since each new tree is dependent on the previous ensemble (which makes parallel processing challenging). Training a BRT model requires tuning multiple hyperparameters-seven in our implementation of the BRT model. 445 The individual performances of the 30 tuned BRT models on their respective testing sets are shown in Figure 7. On average the ensemble of the BRT models had MAE of 3.77 % across all 30 testing sets. Given that the locations of ground sampling https://doi.org/10.5194/hess-2020-271 Preprint. Discussion started: 18 August 2020 c Author(s) 2020. CC BY 4.0 License. 20 points were clustered in the six transects, selection of testing and training datasets can lead to spatial bias. To minimize such biases, the transects for the testing dataset were randomly for each sampling day. Comparison of the performance of models across the different training-testing splits suggests that the potential bias based on testing set selection was minimal. 450  The measured water contents from the testing data sets are compared with the prediction by all the trained models in Figure 8.

Predictor Variable Importance 465
The relative importance of predictors from the BRT model grouped by variable type (lumping variables regardless of variable specifications such as summation window for precipitation or pixel resolution for the raster) are shown in Figure 9. Notice that the only temporally dynamic variables are the meteoric and reflectance variables, the topographic variables are not timedependent and only need to be generated once for a study area. The top four important variables are precipitation, reflectance in the red band, PET, and TPI ( Figure 9). The four most important variables not grouped by type are the 15 and 30-day 470 cumulative precipitation, 30-day cumulative PET, and the red bands. TPI and flow accumulation are the most important of the topographic variables ( Figure S9). https://doi.org/10.5194/hess-2020-271 Preprint. Discussion started: 18 August 2020 c Author(s) 2020. CC BY 4.0 License.

Effect of Predictor Variables
We used the ALE plots to investigate the effects of TPI, curvature, and flow accumulation variables on soil moisture estimates ( Figure 10). Given the high importance of these topographic variables, we were interested in identifying the nature of each variable's relation with soil moisture and thresholds of large changes. Predicted soil moisture generally increased with flow 480 accumulation. At 1 m resolution DEM, soil moisture initially decreased as surface become less convex and increased as surface curvature transitioned from convex to concave (values -5 to +5). Above the value of about +5, soil moisture decreased with curvature. However, at lower scales (3, 5 and 50 m resolution DEM), soil moisture was maximum at convex to concave transition (near 0) but there was no decrease in moisture at higher concave values and with the lowest resolution (50 m DEM) soil moisture continued to increase with an increase in concavity of surface). TPI is very scale-dependent. However, TPI across 485 all scales had pattern relation with soil moisture. Negative TPI values indicate trends towards valleys, zero values indicate flat areas if the slope is shallow or mid-slope areas for areas with significant slope and positive TPI values indicate trends towards ridgetops (Jenness et al., 2013). Across all scales, there was a negative relationship between TPI and soil moisture at negative values and a positive relationship at positive TPI values; this indicates valleys and ridge tops were wetter than mid-slope areas.
The TPI scale that had was the highest variable importance was calculated with an inner diameter of 15 m and an outer diameter 490 of 35 m (TPI (15, 35)). At this scale, the increase in soil moisture moving from mid-slope to valley areas was most pronounced.
Topography has a strong control on soil moisture distribution at landscape scales (Sørensen et al., 2006). While the TPI was the most important topographic variable in determining soil moisture, the TWI was found to be not an important predictor.
Although the machine learning models are considered non-spatial models, that is, they do not consider sampling location https://doi.org/10.5194/hess-2020-271 Preprint. Discussion started: 18 August 2020 c Author(s) 2020. CC BY 4.0 License.
information and spatial autocorrelations (Georganos et al., 2019;Hengl et al., 2018). The inclusion of spatially dependent 495 variables (specifically: curvature, flow accumulation, and TPI) as predictors, however, means that the models do account for spatial information to an extent and this should make the predictions more spatially relevant.
The red band was the most predictive of the three bands. Reflectance in the red-edge was found to be not an important predictor.
The two greenness indices we tested, i.e., NDVI and TTVI, were not important even though their constituent bands, the red and NIR were important. The lower importance of NIR in perdition surface soil moisture was particularly surprising given the 500 sensitivity of NIR band to moisture to plant stress.  Table 2). Black curves represent individual effects of the 30 models, and red curves are smoothed trendlines overall individual models. Marks along the xaxis show the distribution of data in the model training set.

Spatial Prediction of Soil Moisture
Volumetric water content from the best BRT model was used to predict soil moisture for the test area for all six measurement days Figure 11. While the mean moisture content was mostly determined by the day, the distribution shows a visually similar distribution to topography. Ridges appear drier while valleys appear wetter. The distribution of soil moisture predictions over 510 the test area for each day is shown in Figure S10.

26
A close-up map of soil moisture prediction for January 23, 2018, is shown in Figure 12 and shows that soil moisture varies considerably with topography.

Conclusion and Outlook
This research serves as a proof of concept that surface soil moisture can be interpreted with reasonable accuracy from multispectral UAS remote sensing using machine learning methods. As a data mining technique, machine learning model performance and reliability are closely tied to the quantity of data. Although the number and spatial coverage of ground 525 sampling points were reasonable, the addition of more sampling points and some occasional measurements at random locations within the study area would have greatly helped to strengthen the reliability of the models. Multi-year studies are eventually needed to ensure that the model can be used for future predictions. The reflectance variables' dependence on the annual cycle of vegetation versus meteoric variables would be better resolved with multi-year studies. Although more challenging to implement, studies of deeper soil root-zone soil moisture are more ecologically relevant in semi-arid grasslands such as our 530 study site and should be considered. Expanding the reflectance information beyond multispectral bands could lead to important improvements in soil moisture prediction. Although lightweight thermal or hyperspectral sensors are currently very expensive https://doi.org/10.5194/hess-2020-271 Preprint. Discussion started: 18 August 2020 c Author(s) 2020. CC BY 4.0 License. and may not be financially feasible for routine applications at this time, those technologies might become more affordable in the future.
The possibility of using the high-resolution topographic variables from UAS with reflectance data from satellite remote sensing 535 is an interesting topic to study. This would mean the UAS would only be flown once in an area and future and past predictions of soil moisture can be estimated using satellite images and meteoric data from stations. This would also be ideal to downscale satellite data by integrating high-resolution topographic information from UAS remote sensing. Although the machine learning models are high performing and generalizable models, they are non-spatial models that do not consider sampling location information and spatial autocorrelations. This can potentially compromise the model's ability to appropriately address spatial 540 heterogeneity (Georganos et al., 2019;Hengl et al., 2018). Hengl et al. (2018) introduce a novel method to incorporate spatial information into a non-spatial machine learning model by including distances between sampling points as predictor variables, and they show this method (although still in its formative stage) has comparable accuracy to kriging methods. The potential to improve soil moisture predictions by using such spatially integrated methods should be considered in future research.