Improving Calibration and Validation of Cosmic-Ray Neutron Sensors in the Light of Spatial Sensitivity – Theory and Evidence

In the last years the method of cosmic-ray neutron sensing (CRNS) has gained popularity among soil hydrologists, physicists, and land-surface modelers. The sensor provides continuous soil moisture data, averaged over several hectares and tens of decimeters depth. However, the signal still may contain unidentified features of hydrological processes, and many calibration datasets are often required in order to find reliable relations between neutrons :::::: neutron ::::::: intensity and water dynamics. Recent insights into environmental neutrons accurately described the spatial sensitivity of the sensor and thus allowed to 5 quantify the contribution of individual sample locations to the CRNS signal. Consequently, data points of calibration and validation datasets are suggested to be averaged using a more physically-based weighting approach. In this work, a revised sensitivity function is used to calculate weighted averages of point data. The :::::: function :: is :::::::: different :::: from ::: the :::::: simple :::::::::: exponential ::::::::: convention ::: by ::: the ::::::::::: extraordinary ::::::::: sensitivity :: to :::: the ::: first :::: few :::::: meters :::::: around :::: the ::::: probe, :::: and ::: by ::::::::::: dependencies ::: on ::: air :::::::: pressure, :: air :::::::: humidity, :::: soil :::::::: moisture ::: and :::::::::: vegetation. :::: The approach is extensively tested with two calibration and four : at ::: six ::::::: distinct 10 ::::::::: monitoring ::::: sites: ::: two :::: sites :::: with ::::::: multiple :::::::::: calibration :::::: datasets :::: and :::: four :::: sites :::: with ::::::::: continuous : time series datasetsfrom a variety of sites and conditions. In all cases, the revised averaging method robustly improved the performance of the CRNS product and even ::::::: products. :::: The :::::: revised :::::::: approach ::::::: further helped to reveal otherwise hidden hydrological processes ::::: which ::::::::: otherwise ::::::: remained ::::::::::: unexplained :: in ::: the :::: data :: or ::::: were ::: lost :: in ::: the ::::::: process :: of ::::::::::::: overcalibration. The presented :::::::: weighting approach increases the overall accuracy of CRNS products and will have impact on all their applications in agriculture, hydrology, and modeling. 15

After the measurement method was first presented by Zreda et al. (2008), many studies were dedicated to calibrate the sensors and to assess the performance in comparison to conventional instruments (e.g., Rivera Villarreyes et al., 2011;Franz et al., 2012a;Coopersmith et al., 2014;Hawdon et al., 2014;Almeida et al., 2014). These studies showed a good agreement 15 between neutron intensity and independent soil moisture observations. However, outstanding features were also reported in the CRNS data which did not fit well to the accepted theory described by Desilets et al. (2010). Authors suggested that additional hydrological processes and hydrogen pools could influence the signal (e.g. Franz et al., 2013a;Baatz et al., 2014; Baroni and Oswald, 2015), while others applied recalibration of semi-physical parameters to better fit individual site conditions (e.g., Rivera Villarreyes et al., 2011;Lv et al., 2014;Iwema et al., 2015;Heidbüchel et al., 2016b). Despite the unambiguous 20 improvements obtained by corrections and realibration approaches, still some features in many datasets could not be explained by the given theory :::: using :::::: current :::::::::: knowledge and consequently seemed to be unrelated to hydrological processes.
To address some of these knowledge gaps, Franz et al. (2012b) investigated soil hydrological processes with water transport simulations and found that wetting and drying cycles are non-uniquely represented by the CRNS signal. Due to the integrative neutron signal, those hysteresis effects can be most significant when sharp wetting or drying fronts are shaping the soil water 25 profile. As a consequence, Franz et al. (2012b) and Franz et al. (2013b) recommended vertical weighting of point measurements in the profile to account for these effects. Furthermore, Franz et al. (2013b) also demonstrated that the sensor could underestimate average soil moisture by up to 10 % vol :: ten ::::::::: volumetric ::::::: percent depending on the horizontal distribution of water content in the footprint. They concluded that exact knowledge of the heterogeneity is a prerequisite for the interpretation of neutron count rates, and distance-weighting procedures are necessary to obtain sufficient performance during calibration and 30 validation with point data. In order to average calibration and validation data horizontally, Franz et al. (2012a) adopted a sampling scheme based on initial calculations by Zreda et al. (2008) to give every sample an equal weight. The resulting sensor locations at 25,75, and 200 m correspond to an almost exponential horizontal weighting function. Bogena et al. (2013) were the first who applied this horizontal weighting to an irregularly distributed point sensor network, albeit indirectly by fitting the cumulative variant. Nevertheless, many researchers still avoid horizontal weighting by virtually re-locating their irregularly distributed point sensors to the nearest radius of 25, 75, or 200 m in post-processing mode (e.g., Franz et al., 2012a). In complex terrain, only few calibration or validation locations are accessible and their individual contribution to the neutron signal has been unknown for a long time.
Eventually, an over-all improvement of the CRNS data could help to identify hydrological effects more accurately (such as precipitation, ponding, evapotranspiration, and infiltration proccesses).

20
The paper is structured as follows: Firstly, we present the equally weighted, the conventional, and the revised formulation of the spatial sensitivity function (also called weighting function). We then provide a procedure to generate a weighted average of point measurements that can be compared with the CRNS product. The assumptions and uncertainties of this approach are then discussed, followed by a short description of measures used to evaluate the calibration and validation performance, and short descriptions of the studied sites. In the results section we present and discuss the sensor performance using the equal, 25 conventional and the revised weighting approaches for calibration campaigns at two different sites, and for time series data at four sites.
For calibration and validation purposes, the water equivalent in the footprint volume is typically determined independently by an average of point measurements, for example from gravimetric samples or data from soil moisture monitoring networks.
However, those locations can contribute differently to the apparent average of soil moisture as seen by the neutron detector, for example, depending on their distance r from the CRNS probe and their depth d below the soil surface. Depending on their 20 individual contributions, different weights can be assigned to each data point in the calculation of a so-called weighted average.
Among the variety of weighting concepts in the literature, we have selected two of the main and most frequently used strategies from recognized publications, which are based on distinct physical assumptions. On the one hand, the conventional approach covers the main strategies applied so far (Franz et al., 2012b;Bogena et al., 2013). On the other hand, a revised weighting approach has been used which is based on recent findings from Köhli et al. (2015) and which has been further 25 advanced in the present work by the following points: extension of the analytical fit of the radial sensitivity function W r to low :::: short : distances, r ≤ 0.5 m , added dependency of the weighting functions on air pressure p and vegetation height H veg , by introducing a rescaled distance r * (r, p, H veg , θ).
The neutron transport model URANOS has been updated accordingly to provide advanced analytical functions for the spatial sensitivity (URANOS 0.97, available from www.ufz.de/uranos). These advancements generalize the applicability of the results from Köhli et al. (2015) and are recommended for future applications. Please refer to Appendix A for detailed explanations. There are certainly more factors that influence the shape of the neutron sensitivity, for example the height of the detector above ground, different plant species, and large objects. However, those factors are ::::::: irrelevant ::: for ::: the :::::::::: investigated :::: sites :::: and ::: are 5 ::: thus : of minor importance for the conclusions in this manuscript and should be investigated in future studies :::: work.
In addition to the conventional and the revised approach, this work includes the equal average weighting strategy (weights equal 1) to compare the performance when the CRNS signal is intuitively treated as a large-area averaging soil moisture product, as was done especially at macroscopically homogeneous sites.
2.1 Vertical weighting in the soil profile 10 Simulations by Zreda et al. (2008), Franz et al. (2012b), and Köhli et al. (2015) have shown that the neutron signal integrated over a vertical soil column exhibits the highest sensitivity to the uppermost layers. Therefore, independent soil moisture measurements taken at different depths, d, need to be weighted differently in order to account for the underlying physical processes.
In contrast, the revised vertical weighting function, W d , takes the full soil profile into account (as neutrons do) and considers the fact that the effective depth decreases with increasing distance r from the detector: where D 86 :: D denotes the effective penetration depth, defined as the depth within which 86 % of neutrons probed the soil (see 25 Köhli et al., 2015). These relations are based on URANOS simulations and follow recent insights about the physics of neutron transport and detection near the soil-atmosphere interface. Based on the formulation from Köhli et al. (2015)  and z * (θ) ::::::: D conv (θ), respectively. On average, both approaches follow an almost similar shape, however the conventional formulation is independent of distance r and soil bulk density bulk. (b) Normalized vertical weighting functions (eqs. 3 and 4) :::: based ::: on :: 12 ::::: sample :::::: points.
of the revised penetration depth D 86 :::::::: D ≡ D 86 now add the dependency on air pressure and vegetation height, expressed in the scaled distance term r * (see Appendix A).

Horizontal weighting in the footprint area
In this work we make use of three horizontal weighting functions to average soil moisture measurements at distances r from the CRNS probe. First, the equal average (weights equal 1), which was usually applied for validation with soil moisture networks 5 and remote sensing products. It was also used for calibration datasets if locations were arranged according to the COSMOS standard sampling scheme, (25 m, 75 m, 200 m), such that the samples automatically represent areas of equal contribution to the neutron signal. These calculations were based on a simple exponential sensitivity function (Zreda et al., 2008) and presented by Franz et al. (2012a) and Zreda et al. (2012).
Second, the conventional weighting approach uses an (almost) exponential sensitivity function based on Monte-Carlo simu-10 lations from Zreda et al. (2008). It is implicitly referred to when using the COSMOS standard sampling scheme (Zreda et al., 2012). An analytical form of the conventional horizontal weighting function has never been published. However, it can be derived from the cumulative function CFoC(r), presented by Bogena et al. (2013, eq. 13), who fitted data from Zreda et al. (2008, Fig. 3) in the domain of r ≤ 300 m: where a i = 1.311 · 10 −2 , 9.423 · 10 −5 , 3.2 · 10 −7 , 3.95 · 10 −10 (5) 15 To account for the remaining contribution beyond 300 m, the (usually few) data points have been assigned the weight W conv r>300 = W conv r=300 . One of the major shortcomings of this exponential approach is the underestimation of the high sensitivity of the neutron signal to the first few meters around the sensor.
As a third strategy, we use the revised weighting approach based on URANOS simulations and corresponding analytical fits (see Köhli et al., 2015, for details). New technical advancements of this function include the dependency on air pressure p and 20 humidity h by introducing the rescaled distance r * , as well as the extension below r ≤ 0.5 m.
Parameter functions F i , their corresponding parameters, the formulation of the rescaled distance r * (r, p, H veg , θ), as well as further explanations are given in Appendix A.
from the CRNS probe. In each profile, point measurements of volumetric water equivalent θ P,L are given at various layers L of depth d L . Observations of air pressure p, air humidity h, and vegetation height H veg are given at the time of interest, while estimations of soil bulk density bulk exist for every profile (or even every sample). The general function to calculate an average of point measurements i with values θ i and weights w i is given as: The procedure to obtain a weighted average of soil water equivalent, θ , is described as follows :::: (see ::: also :::: Fig. :: 3): 1. Estimate an initial value θ = wt (θ P,L , 1) by an equally weighted average over all profiles P and layers L.
The above procedure weights each data point θ P,L according to its depth d and distance r from the CRNS probe. However, 10 when a finite number of sample points are chosen, assumptions are involved in the spatial domain they represent. Depending on knowlegde about the individual field conditions, interpolation between soil layers, for instance, is a good option to assign each measurement to a certain soil horizon. Let Ω(r, ϑ) [m 3 ] be the spatial domain of the footprint volume in polar coodinates, Ω P [m 2 ] the horizontal representative area of the profile P , and Ω L [m] the representative soil horizon of the measurement at layer L. As each measurement θ P,L is representing the volume Ω P · Ω L , its weighted contribution to the neutron signal should 15 be integrated over this domain: Horizontal contribution of profile P : For example, if soil samples were taken at two depths, 10 cm and 40 cm for instance, it could be reasonable to integrate their weights from d = 0 to 30 cm and from 30 to 50 cm, respectively. In the horizontal space it might be sometimes reasonable to integrate a single profile measurement over the whole area of similar soil and landuse type (as has been done 20 in section 4.4). If sample locations were arranged in an interpolated, regular grid (see e.g., ::::: pixels :: of :::: size :::: 1 m :: in : Fig. 10), then each pixel should be weighted individually as a point such that the integrals above can be simplified. For example in a polar grid with 6 sectors, each sector at distance r is to be weighted with sector W r = W r /2πr · (r/6) = W r /(6π). In a rectangular grid of grid ::::: While :: an ::::::::::: infinitesimal ::::: point :: at ::::::: distance :: r ::: has ::: the :::::: weight :::::::::: W r /(2π r), :: a :::::: regular ::::: pixel :: of : size s , the number of pixels per ring, :: at ::: that ::::::: distance :::::: weighs ::::::::::::::::::: W r /(2π r) · s ∝ W r /r. :::: For :: all ::::::: radially ::::::::: symmetric :::::::: sampling ::::::: schemes, :::::: where 25 :::: each :::: point ::::::::::: measurement ::::::::: represents ::: one ::: of n , :::::: circular :::::: sectors, ::: the :::::: sector at distance r is n(r, s) = r/s, such that the weight for each pixel is to be pixel W r = W r /(2πr) · (r/n) ∝ W r /r :: has ::: the :::: size :::: (arc :::::: length) :: of ::::::: 2π r/n, ::: and :::: thus :::::::::: contributes ::: the :::::: weight :::::::::::::::::::::: This ::: The : strategy, to take into account estimations of representative volumes, initially appears to be more realistic. However, the extrapolation of data points involves assumptions on the site-specific heterogeneity and thereby on the strategy of interpolation. It further requires expert knowledge about the individual field conditions. During the preparation of this work, we found that the usage of weights for distinct measurement points provided fair approximations of the integrals, i.e. W r P ≈ w P and W d L ≈ w L , and eventually resulted in almost similar averages, θ , throughout all cases investigated (not shown). Let S be the domain of the representative volume of the sample locations (e.g., the areal extent of the soil moisture monitoring network), and let Ω be the spatial domain of the CRNS footprint as defined in the previous section. Then, the outer region Ω\S denotes the part of the footprint domain which is not represented by the samples. The contribution of the "sample area" S to the neutron signal then is: which can range from 0 to 100 % and depicts the fraction of detected neutrons which carry information from (i.e., had contact with) the sample area S. Assume that the observed soil moisture in S is on average θ , and that the soil moisture in the outer region, Ω \ S, can be estimated as θ ± ∆θ. The propagation of this error through W r (h, θ) leads to an uncertainty ∆N of the total neutron signal N , and eventually adds uncertainty to the CRNS product, θ(N ± ∆N ). In this manuscript, this estimation is applied exemplarily to the Schäfertal site (section 4.2) in order to quantify the errors introduced by non-ideal ::::::::: incomplete : coverage.

Performance Measures
To evaluate the performance of time series and calibration data, we apply prominent measures used in environmental and 25 hydrological research. The robustness of this approach is evaluated by applying different performance measures, which is a common strategy to falsify new methodological approaches (see e.g., Glaser et al., 2016). Popular efficiency measures are the Nash-Sutcliffe-Efficiency (NSE) (Nash and Sutcliffe, 1970) and the more modern Kling-Gupta-Efficiency (KGE) (Gupta et al., 2009), while the Root-Mean-Square-Error (RMSE) and the Pearson correlation coefficient (ρ) are well-established standard approaches.
where A = θ(N, N 0 ) denotes the water equivalent measured by the CRNS (N 0 needs to be calibrated), B denotes the actual field soil water equivalent, θ, measured by independent instruments, and x = 1 n n 1 x denotes the average (expected value) of a set of data points x. In the ideal case of optimal agreement between the variables A and B, the measures would reach 5 NSE = 1, KGE = 1, RMSE = 0 , and ρ = 1.
NSE normalizes the mean squared error by the observed variance, where the mean observed variable B is used as a baseline. Following this approach, site-specific variations could translate to biased estimation of model skills among different sites. On the other hand, the KGE measure is a revised version of NSE that allows to analyze the relative importance of linear correlation ρ, variability σ, and bias · of simulated and observed variables (Gupta et al., 2009). RMSE is simply a measure of 10 the differences between two time series but is prone to biased datasets and outliers. The correlation ρ is an accepted approach in experimental geophysics to identify similar or unknown effects (e.g., Fu et al., 2015) in two time series. However, if many factors could explain a single observation, only using the correlation measure may lead to false recognition of coincidental effects.
In the following analysis, we have thus optimized the KGE value between the CRNS and the independent soil moisture data to find a single calibration parameter N 0 per site.
Agricultural site in the lowlands of Braunschweig (GER). The second calibration site is an irrigated agricultural field in the northern lowland of Germany near Braunschweig, at an elevation of 60 m asl. Annual precipitation is 620 mm and average 10 temperature 9.2 • C. The 12 ha area is irrigated in 50 m wide strips with pre-treated waste water, as the sandy soils exhibit low water and nutrient holding capacity. The CRNS probe was located in the center of the field (52.3587 • N, 10.4004 • E) and several FDR devices provided point measurements of soil moisture. In 2014 the field was cropped with maize (Zea mays), that was drilled in mid-April harvested on September, 27th.
The hillslopes and creek in the Schäfertal (GER). The intensive monitoring site Schäfertal (11 • 03 E, 51 • 39 N, 395 m asl) is an agriculturally used catchment in the middle-mountain area of the Harz mountains in Central Germany (Zacharias et al., 2011;Wollschläger et al., 2016). Parts of the hillslope grassland transect is equipped with a wireless soil moisture monitoring 5 network. It has a spatial extent of ca. 240 x 40 m and comprises a North-and a South-exposed slope as well as a valley bottom crossed by a creek oriented West to East. Silty-loam Cambisols occupy the slopes whilst finer-textured and highly organic soils evolved in the riparian zone between the footslope and the creek (Martini et al., 2015).
The data weighted with the revised functions demonstrates that the lines infered from the calibration points converge much closer to a single theoretical line (Desilets et al., 2010). Although this approach almost removes the unrealistic effect of reduced hydrogen pools, the assumption of a single calibration paramter N 0 must be considered to be illegitimate due to significant biomass dynamics in the investigated period. The remaining deviation of the three calibration curves still indicates a small water reduction effect, however its magnitude is insignificant given the observational uncertainty of the neutron counter. It

5
remains an open questions whether a revision of the parameters of eq. 1 would better catch the local dynamics and further contribute to the interpretation of the signal. Nevertheless, the example shows that the revised weighting strategy contributes to a more realistic interpretation of the water availability from CRNS measurements, which is especially important when used in conjunction with irrigation management.
Consequently, the partial coverage of the CRNS footprint by the irregularly distributed SoilNet hampers the proper evaluation of the CRNS data, and especially of the weighting strategies.

Identification of additional hydrological processes
The pasture site Grosses Bruch is a good example how an inappropriate averaging approach could hinder sufficient interpretation of time series data. Fig. 8 shows the soil moisture signal predicted from a stationary CRNS probe and the weighted signal of a soil moisture monitoring network (SoilNet) with sensors installed in depths from 0.05 m up to 0.6 m. Following the precipitation events in the second half of October, the shallow groundwater and loamy texture allowed large water ponds to reside 5 permanently in the outer regions of the SoilNet (light blue indication on the map). As distant areas contribute much less to the CRNS signal than closer ones, the revised weighting approach has significantly reduced the influence of the saturated point data to the apparent CRNS average. Without the revised method, the CRNS product would have overestimated the ::::::: absolute ::::::::: volumetric field saturation by more than 5 % v ::: 5 %. Additionally, beginning in the mid of Septembera significant amount of cows were : , ::::: many :::: cows :::: had :::: been : present at this site, which is assumed to lead ::: are ::::::: assumed :: to :::: have ::: led : to large variations of 10 the neutron signal and thus to a non-meaningful expression of correlation-related measures.
In the Wüstebach forest site, weighted averaging of the soil moisture monitoring network is performed based on the data presented in Bogena et al. (2013). The analysis shows three interesting effects on the resulting soil moisture signal in Fig. 9.
First ::::: Firstly, the signal processed with the revised weighting approach (blue) is wetter than the conventionally weighted signal (orange). This effect is reasonable due to the higher soil water contents of the groundwater-influenced riparian zone, where the 15 CRNS is located, compared to the terrestrial soils at the hillslopes. Second ::::::: Secondly, the CRNS signal which was calibrated to the revised weighted soil moisture (light blue) outperforms the signal that was calibrated on the conventionally weighted soil moisture (light orange). This performance gain is robust in terms of the four measures. In order to avoid ::::::: incorrect : conclusions from overcalibration of the data during rain events (periods of high interception water), we repeated the same analysis for dry periods only, which however resulted in the same conclusions . ::: In ::: this :::: case ::: the :::::: revised :::::::: approach ::::: again ::: led :: to ::::::: highest 20 :::::::::: performance : (not shown) . Third ::: and ::::::::: confirmed ::: the ::::::::: robustness :: of ::: this ::::::::: approach. :::::: Thirdly, differences between CRNS and Soil-Net appear to be significantly more prominent for the revised approach (blue) in periods following huge precipitation events (May, July and October). Those periods can probably be attributed to expected canopy water storage, interception storage, groundwater rise, and nearby accumulation of ponds. Ponded water in local hollows, trenches, and the litter layer is not visible by :: are ::: not :::::: visible :: in : the soil profiles of the monitoring network, which are typically installed in solid and elevated ground. In 25 contrast, their effect can be visible in stronger oscillations and shift of the CRNS signal.

Summary of the analyzed research sites
The experimental sites used in this study and the corresponding gain for environmental and hydrological research is summarized in Table 3.
To give further advice on a reasonable distribution of points for homogeneous terrain, sampling radii R i of concentric circles 5 could be calculated as follows.
First, select a total number of circles n based on prior knowledge about the patterns at the individual site. Since the signal contribution of an area between any radii can be calculated by integrating W r (compare also Köhli et al., 2015, eq. 1), the n borders of equal areal contribution, r i , i ∈ (1, ..., n), can be calculated by solving the integral: 10 Then, the sampling radii R i can be selected anywhere between r i and r i+1 , as they are assumed to represent the area of the corresponding homogeneous annulus. A simple guideline could be to set the sampling radius in the geometrical center: where the last sampling distance R n could be set to any point that is expected to represent the whole area beyond r n .

5
Having that said, is :: Is this strategy still robust against complex terrain and variable weather? Field sites differ in terms of spatial heterogeneity and variability due to terrain features or highly heterogeneous correlation lengths of soil moisture patterns.
Hence, implementing a strict, universal sampling scheme often is neither feasible nor meaningful with regards to individual conditions in the field. In this study the application of the revised weighting approach led to improved CRNS performance at all sites and for regular and irregular sampling designs. Apparently, the presented weighting procedure is robust across various 10 sites, sampling configurations, and wetness conditions. An advantage of the approach is its straight-forward applicability, which essentially applies a simple distance-weighted average to a set of data points, and does not require additional, complex analysis or interpolation strategies. The only assumption made is that each sample point represents an equal area in the footprint. Apart from sophisticated optimal sampling designs, three of the most simple sampling strategies are (1) regular grids, (2) random locations, and (3) locations that represent stable soil moisture patterns ::::::: patterns ::: (of ::: soil :::::::: moisture :: or :::: land :::::: cover). However, judgment about their performance is far beyond the 5 scope of this work. In any case, it could be recommended to reduce the uncertainty of locations close to the detector (e.g., by taking repeated measurements), because neutron theory has shown that the CRNS signal is most sensitive to nearby locations.
In some cases it could help :: A ::::: simple :::: and :::::::: pragmatic :::: way :: to :::::: design : a ::::::::: reasonable :::::::: sampling ::::::: scheme ::::: could :: be to choose sensor locations just in the way it is described by the ::::: based :: on ::: the : approximated horizontal sensitivity function W * r (Appendix B). Under these conditions :: As ::: this ::::::: function ::::: does ::: not :::::: depend :: on :::::::: dynamic :::::: changes ::: of :::::::::: surrounding :::::::: hydrogen ::::: pools, an equal average 10 is ::::: would :: be : sufficient in post-processing mode. However, the dependence on air humidity h and soil moisture θ will introduce temporal errors to this approach. In this case it could be recommended to correct the equal average with its dynamic variability, which can be expressed as the variation of W r (h, θ) around its mean, W * r . To circumvent a potential bias introduced by arbitrarily distributed locations, it could be better to apply different zonation approaches or interpolation methods (e.g., Kriging in polar coordinates) before each cell of the interpolated grid is weighted. 15 However, this always comes with additional assumptions. For example, in the sampling strategy presented in section 4.4 certain soil moisture patterns in the field were categorized as four areas of different landuse which were expected to behave equally in the footprint in terms of soil water dynamics. The horizontal weighting was then applied to those measurements depending on the location of the contributing area in the footprint. In our opinion this method probably provides the highest accuracy in most cases, although it requires prior knowledge about the distribution of soil type compartments in the footprint.

20
This study has focused on the theory and application of the averaging approach, while the performance of different interpolation strategies might depend on local soil patterns and deserves a study on its own, for their performance always depend on the local structures and correlation lengths of soil moisture.
3. Although existing data can be weighted in post-processing mode, missing locations close to the detector as well as 10 insufficient coverage of the CRNS footprint introduce significant uncertainty. It can be quantified with the help of the radial sensitivity functions, as has been presented in section 2.4 and section :::::: sections ::: 2.4 :::: and 4.2.
4. Sampling strategies that are based on concentric rings can only be recommended for homogeneous terrain (where each sampling location is known to contribute equally to the signal) and should be adapted on the local site conditions (air pressure, humidity, soil moisture, vegetation cover). If the samples are arranged according to eq. 8, their equally weighted 15 average would provide a value that is comparable to the CRNS product. On the other hand, if the footprint is covered by heterogeneous soil and landuse patterns, the sample locations should be adapted to distinct representative clusters, which in turn should then be weighted based on their areal contribution to the signal (see section 4.4).
5. Data points in the first 0 to 10 m radius and 0 to 20 cm depth around the sensor are most important for calibration and validation purposes. It is thus recommended to reduce the uncertainty of those measurements, e.g., by avoiding flints in the samples, or by increasing the number of samples in that area.

10
The revised weighting functions presented here are provided in the supplementary material in R, MATLAB, and Excel (see Appendix C). Furthermore, an approximated weighting function W * r (Appendix B) has been suggested to simplify quick analysis of the horizontal contributions independently of the local wetness conditions. However, the latter approach should be taken with care, for its adequate performance has not been sufficiently confirmed in this work.
Within this study many datasets have been reanalyzed to test the revised weighting approach. Due to its overall success, it 15 is recommended to revisit also other studies, especially where the conventional approaches have not led to the expected results (e.g., Franz et al., 2012a;Almeida et al., 2014;Iwema et al., 2015). In the light of the discussion provided, we recommend future studies to improve the sensor performance even further. For example by investigating the effect of different sampling designs and interpolation strategies, or by recalibrating the parameters of the theoretical line, N (θ). Specific URANOS simulations of the neutron distribution at the individual sites can further help to identify the contribution to the detector signal of different 20 parts in the footprint.
On the basis of the results gained by this study and in the light of the conclusions above, it can be deduced that CRNS stations placed in mostly homogeneous terrain offer the highest interpretability of its field-scale signal. This is a feature that the CRNS method has in common with many other hydrometeorologial instruments, like weather stations (Jarraud, 2008) or eddy covariance towers (Rebmann et al., 2005). However, even in complex terrain CRNS probes are capable to catch hydrogen 25 pools that otherwise would be very difficult to monitor (e.g. ponding, interception), while their sensitivity to specific parts of the footprint can be quantified with the help of W r . Thereby, the present study demonstrates a way forward to a better understanding of the spatial contributions to the neutron signal, and elaborates the potential of cosmic-ray neutron sensors to quantify hydrological features that are almost impossible to be caught with conventional instruments.
Appendix A: The revised weighting functions analysis has been performed to investigate the dependency of the sensitivity functions on other environmental variables, and relations have been found that do not further complexify the analytical formulations of W r and W d . The weighting functions can easily adapt on variations of air pressure p and vegetation height H veg by scaling their argument r with the scaling rules of the footprint radius R 86 (cf. Köhli et al., 2015, eqs. 4-6): W r (h, θ, p, H veg ) ≈ W r * (h, θ) , and W d (θ, r, p, H veg ) ≈ W d (θ, r * ) , where r * (r, p, H veg , θ) = r·/ : F p ·/ : F veg (H veg , θ) .
(A1) 5 Fig. 12 shows that this approximation performs well for various wetness conditions, as simulated curves and pressureadapted curves are almost parallel (relative agreement is sufficient as weighting functions typically perform in a relative mode).
Moreover, the data analysis in this work sometimes requires realistic weights to be applied for samples located within r < 0.5 m, which is by definition an invalid range for W r (h, θ) as reported by Köhli et al. (2015). We therefore felt the need to extend the horizontal weighting function to the range below 0.5 m. In this work, we introduced an additional exponential factor 10 in eq. 6 which accounts for the steep increase near the detector. This peak has geometrical reasons and essentially comes from the fact that (1) only few neutrons can originate from small radii (W r→0 → 0), and (2) the neutrons coming from higher radii have a lower chance to hit the detector (W r→∞ → 0).

Appendix B: A simplified approximation
As the analysis in this work has shown, the conventional horizontal weighting function can underrate soil moisture near the sensor by factors up to 25. Furthermore, the variability of the radial weighting function W r (h, θ) with environmental conditions can have significant influence on the soil moisture average where accuracy matters. In cases where simplicity and computational efficiency is a criterion, an approximated weighting function W * r can be proposed, which is an averaged formulation over dry 5 and wet conditions:  30 e −r * /1.6 + e −r * /100 1 − e −3.7 r * , 0 m < r ≤ 1 m 30 e −r * /1.6 + e −r * /100 , r > 1 m.
(B1)  with all datasets of this study have indicated that the corresponding soil moisture average deviates from the exactly-weighted average not more than 2 % v ::::::::: ∆θ v < 2 % : (not shown). However, the deviation highly depends on h and θ and thus can be an important source of error in temporal analysis where large ranges of humidity are expected. Also note that the integral of the approximated function does not scale with neutron intensity anymore, which has however no impact on normalized weights.

5
Further studies will demonstrate whether eq. B1 is accurate enough to improve the CRNS performance under various wetness conditions and in different sites. If so, the reduction of computational effort will be valuable for regular analysis and for end users in the applied sector.
Appendix C: Toolbox for spatial weighting of point data Proper horizontal and vertical weighting of point measurements is a prerequisite for validation and calibration of the CRNS 10 method. Before the publication of Köhli et al. (2015) almost all users of CRNS probes avoided horizontal weighting. However, the revised neutron physics model reveals a highly non-linear shape of the detector's radial sensitivity. The corresponding publication has been distributed with supplemental material that provided the weighting functions W r as ready-to-apply Excel, R and MATLAB scripts. As the present study advanced the analytical fits of the spatial sensitivity functions (Appendix A), the corresponding updated script files can be found in the supplementary material. 15 Moreover, an easy-to-use toolbox has been prepared in form of an Excel sheet to guide users through the weighting process.
This sheet is able to take a snapshot of point data around the sensor and calculates the corresponding CRNS footprint R 86 , the average penetration depth D 86 , and the weighted average soil water content according to guidelines in this manuscript.