A climatological benchmark for operational radar rainfall bias reduction

The presence of significant biases in real-time radar quantitative precipitation estimations (QPEs) limits its use in hydrometeorological forecasting systems. Here, we introduce CARROTS (Climatology-based Adjustments for Radar Rainfall in an OperaTional Setting), a set of fixed bias reduction factors, which vary per grid cell and day of the year. The factors are based on a historical set of 10 years of 5 min radar and reference rainfall data for the Netherlands. CARROTS is both operationally available and independent of real-time rain gauge availability and can thereby provide an alternative to current QPE adjustment practice. In addition, it can be used as benchmark for QPE algorithm development. We tested this method on the resulting rainfall estimates and discharge simulations for 12 Dutch catchments and polders. We validated the results against the operational mean field bias (MFB)-adjusted rainfall estimates and a reference dataset. This reference consists of the radar QPE, that combines an hourly MFB adjustment and a daily spatial adjustment using observations from 32 automatic and 319 manual rain gauges. Only the automatic gauges of this network are available in real time for the MFB adjustment. The resulting climatological correction factors show clear spatial and temporal patterns. Factors are higher away from the radars and higher from December through March than in other seasons, which is likely a result of sampling above the melting layer during the winter months. The MFB-adjusted QPE outperforms the CARROTS-corrected QPE when the countryaverage rainfall estimates are compared to the reference. However, annual rainfall sums from CARROTS are comparable to the reference and outperform the MFB-adjusted rainfall estimates for catchments away from the radars, where the MFB-adjusted QPE generally underestimates the rainfall amounts. This difference is absent for catchments closer to the radars. QPE underestimations are amplified when used in the hydrological model simulations. Discharge simulations using the QPE from CARROTS outperform those with the MFB-adjusted product for all but one basin. Moreover, the proposed factor derivation method is robust. It is hardly sensitive to leaving individual years out of the historical set and to the moving window length, given window sizes of more than a week.


30
A large number of correction methods is already available. These methods range from corrections prior to the rainfall estimations, e.g. corrections for physical phenomena such as ground clutter, attenuation, the vertical profile of reflectivity and variations in raindrop size distribution (e.g., Joss and Pittini, 1991;Berenguer et al., 2006;Cho et al., 2006;Uijlenhoet and Berne, 2008;Kirstetter et al., 2010;Qi et al., 2013;Hazenberg et al., 2013Hazenberg et al., , 2014, to statistical post-processing steps for bias removal in the radar QPE using rain gauge data. These post-processing methods either merge rain gauge and radar QPE from 35 the same interval or base correction factors on the total precipitation in both products over a past period, such as a number of rainy days (e.g. seven days in Park et al., 2019). An often used method is the mean field bias (MFB) correction method, which determines a spatially-averaged correction factor from the ratio between rain gauge observations and the radar QPE of the superimposed grid cells at the locations of these gauges (Smith and Krajewski, 1991;Seo et al., 1999). This method, which is used operationally in the Netherlands and many other countries (Holleman, 2007;Harrison et al., 2009;Thorndahl et al., 40 2014; Goudenhoofdt and Delobbe, 2016), does not account for any spatial variability in the radar QPE bias, even though the bias is known to increase with increasing distance from the radar (Joss and Lee, 1995).
It is possible to account for this spatial variability with geostatistical techniques (e.g. ordinary kriging, kriging with external drift or co-kriging, Krajewski, 1987;Creutin et al., 1988;Wackernagel, 2003;Schuurmans et al., 2007;  same 31 automatic rain gauges are used for the MFB adjustment method, which will be introduced in Sec. 2.2.1. In contrast to 85 the spatially uniform MFB adjustment, the observations from the manual rain gauges are used for spatial adjustments, based on distance-weighted interpolation of these observations (Barnes, 1964;Overeem et al., 2009b). This product is considered as a reference rainfall product in the Netherlands and it is therefore also regarded as reference here (referred to as R A in this study).
The R A data is not available in real time (available with a delay of one to two months), but it is archived and can therefore be used for 'offline' methods. Both R A and R U have a 1-km 2 spatial and 5-min temporal resolution.

90
The year 2008 is actually the first year in the KNMI archive of both data sets, but it was left out of the analysis here. R U for this year showed a significantly different behaviour than the other years, especially during the first half year in which the product rarely underestimated and frequently even overestimated the rainfall sums. The reason for this behaviour is not yet fully understood. KNMI (2009) reported that spring was exceptionally dry in the north of the country and that the months January and May were among the warmest on record. On some days with overestimations, clear bright band effects were visible in the 95 radar mosaic, which may have contributed to the systematic differences. . Scatter plot indicating the systematic discrepancy between the reference rainfall (RA) and the unadjusted radar QPE (RU). Shown are the daily country-average rainfall sums based on ten years (2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018), classified per season. The slope of a linear fit between the two rainfall products is shown in the bottom right for all observations together (indicated with 'sT OT ') and per season. Figure 2 indicates the need for correction of the radar rainfall product. R U systematically underestimates the true rainfall amounts, averaged for the land surface area of the Netherlands, with 56%. This bias is not uniform in space, as will be highlighted in Sec. 3, and in time with higher underestimations during winter (on average 65%) than during the other seasons 100 (53 -55 %). In the following two subsections, the operationally used MFB adjustment method and the in this study proposed CARROTS method will be introduced.

Mean field bias adjustment
The mean field bias (MFB) adjustment method is the operational adjustment technique in the Netherlands and it was used in this study for comparison with the proposed climatological bias reduction method (Sec. 2.2.2). This method provides a spatially 105 uniform multiplicative adjustment factor that is applied to R U . The adjustment factor (F MFB ) was calculated as (Holleman, 2007;Overeem et al., 2009b): with G(i n , j n ) the hourly rainfall sum for gauge n at location (i n , j n ) and R U (i n , j n ) the unadjusted hourly rainfall sum for the corresponding radar grid cell. The calculation of F MFB only took place when both the rainfall sum of all rain gauges together 110 and the sum of all corresponding radar grid cells was at least 1.0 mm. In all other cases, F MFB = 1.0.
The MFB adjustment factors were determined from the 1-hr accumulations of both R U and the 31 automatic rain gauges, as only the automatic gauges were operationally available every hour (Holleman, 2007;Overeem et al., 2009b). The adjustment factors at the temporal resolution of the radar QPE (5 min) were assumed to equal the 1-hr adjustment factors for a given hour.
Moreover, this analysis took place with archived datasets, which were validated and consisted of quality-controlled rain 115 gauge observations. It is noteworthy that the same quality control is absent and that missing data occurs in real-time, which can lead to deteriorating results when the MFB adjustment is applied in an operational test case.

CARROTS method
To derive the climatological bias correction factors for the CARROTS method, both R U and R A were used for the years 2009-2018. The use of the reference data for this method was possible, because the CARROTS method did not require a real-time 120 availability of the data. The bias correction factors were determined per grid cell in the radar domain according to the following three steps: 1. For every day in the period 2009-2018, all 5-min rainfall sums (both R U and R A ) within a moving window of 31 days (the day of interest plus the fifteen days before and after it) were summed. The purpose of the moving window was to smooth the systematic day-to-day variability of the estimated rainfall in the 10-year data. Sections 2.4 and 3.4 describe 125 the sensitivity of the method to the moving window size.
2. For every day of the year, the 31-day sums around that day were averaged over the ten years. Thus, the value for e.g. 16 January consisted of the average 31-day sum for the period 1 to 31 January over the ten years.
3. Finally, gridded climatological adjustment factors (F clim ) were calculated per day of the year as:

130
with R A (i, j) the reference rainfall sum and R U (i, j) the unadjusted (operational) radar rainfall sum at grid cell (i, j) for the ten years.

Application to twelve basins
Both bias adjustment methods were applied to the ten years (2009-2018) of R U . In order to provide a hydro-meteorological testbed, both the CARROTS and MFB adjusted QPE products (from here-on referred to as R C and R MFB , respectively) were 135 validated against the reference rainfall. In addition, R C and R MFB were used as input for the rainfall-runoff models of twelve basins (a combination of catchments and polders) in the Netherlands (Fig. 1). Most of the involved water authorities use these (lowland) rainfall-runoff models either operationally or for research purposes, often embedded in a Delft-FEWS system (Werner et al., 2013). For this reason, most models were already calibrated (e.g. Brauer et al., 2014b;Sun et al., 2020). SOBEK RR(-CF) (Stelling and Duinmeijer, 2003;Stelling and Verwey, 2006;Prinsen et al., 2010) is semi-distributed and therefore 140 we used sub-catchment averaged rainfall sums from the gridded radar QPE. WALRUS (Brauer et al., 2014a) is lumped, so the radar QPE was catchment averaged and used as input. A more detailed description of both rainfall-runoff models is outside the scope of this paper. All twelve model setups were run with a 5-min time step for the period 2009-2018. The resulting discharge simulations were validated for the same period using the Kling-Gupta Efficiency (KGE) metric (Gupta et al., 2009).

Sensitivity analysis 145
As mentioned in Sec. 2.2.2, the purpose of the 31-day moving window in the factor derivation of CARROTS was to smooth the day-to-day variability of rainfall. To test the sensitivity of the method to the employed moving window size, the adjustment factors were re-derived for a range of moving window sizes (1 day, 1 week, 2 weeks, 6 weeks and 2 months). The derived factors were then compared to the original factor in this study, which was based on a moving window size of 31 days, and used to derive adjusted QPE products. Subsequently, these QPE products served as input for one of the 12 catchments, namely the 150 WALRUS model for the Aa catchment ( Fig. 1), to test the effect on the simulated discharges. The Aa catchment was chosen because the unadjusted QPE product (R U ) for this catchment has one of the highest biases of the twelve studied catchments (see Sec. 3 and Fig. 4).
Besides the moving window choice, the length of the radar rainfall archive (ten years) was finite. To test whether or not this archive length was sufficient for reaching a stable factor derivation, individual years in the ten-year archive were left out of 155 the CARROTS method. Hence, the adjustment factors were recalculated ten times, applied to R U and used as input for the WALRUS simulations for the Aa catchment.

Seasonal variability
The adjustment factors from CARROTS present the spatial variability in the radar QPE errors, with generally higher adjustment 160 factors towards the edges of the radar domain (Fig. 3). This difference is most pronounced from December through March, with more than two times higher factors in the south and east of the country than in the central and northwestern parts (Fig. 3a, b and l). An explanation for these higher adjustment factors from December through March is that radar QPE often severely underestimates the rainfall amounts for stratiform systems, which regularly occur during the Dutch winter. This especially holds when the QPE is constructed from reflectivities sampled above the melting layer (Fabry et al., 1992;Kitchen and Jackson, 1993; isotherm level, using a constant wet adiabatic lapse rate of 5.5 • C km −1 with ground temperature data for all rainy hours in the ten years (Fig. 4b), indicates that the 1500 m pseudo-CAPPI is generally above the 0 • C isotherm level from December through March. This coincides with the months with higher adjustment factors (Fig. 4c) and could thus explain the winter effect on the adjustment factors. This effect is presumably even stronger further away from the radars, because the QPE product consists of samples at even higher altitudes than 1500 m for locations at more than 120 km from the radars. Besides, an additional  the superimposed grid cell of this station. Note that for this analysis, the adjustment factor was based on only the rainfall sums within that month, the effective adjustment factor for that month, which roughly coincides with the factor for the 15 th of the month in the CARROTS method. The grey bars indicate the interquartile range (IQR) for that month, based on the spread in hourly 0 • C isotherm level estimates (the horizontal bars) and the sensitivity to leaving out individuals years in the ten year period for the factor derivation (vertical bars).
dependence of the monthly factor on the time of year that cannot be explained by temperature, seems to be present with lower adjustment factors during spring and early summer and higher factors for the subsequent period (Fig. 4c). Luntersebeek (

Annual rainfall sums
An advantage of the MFB adjustment is that it corrects for the circumstances during that specific day and thus also for instances with overestimations (Fig. 4a). The negative effect of the spatial uniformity of the factor, however, becomes apparent in Fig. 5,   180 which compares the annual precipitation sums of the two adjusted radar rainfall products with the reference and R U for four of the catchments. Both adjusted products manage to significantly increase the QPE towards the reference, which is clearly needed, as R U underestimates the rainfall amounts by 50% or more for these four catchments. The CARROTS QPE (R C ) outperforms the MFB adjusted QPE (R MFB ) for the Aa and Regge catchments, which are located in the far south and east of the country, respectively. R MFB still underestimates the annual reference rainfall sums with on average 20% for the Aa and 13% for the Regge, while this is on average only 5% (both under-and overestimations) for R C .
The MFB adjusted QPE performs better for the Dwarsdiep polder (10% underestimation) and Luntersebeek catchment (6% underestimation) due to their location in the radar mosaic. The Luntersebeek catchment (central Netherlands, Fig. 1) is located closer to both radars. There, R MFB generally performs better and sometimes even overestimates the true rainfall, which is consistent with Holleman (2007). Holleman (2007) also indicates that the MFB adjustment performs well in the north of the 190 country, where the Dwarsdiep catchment is located, even though this region is closer to the edge of the radar domain. The CARROTS QPE tends to overestimate the rainfall amount of both basins for some years (e.g. with 16% for the Luntersebeek in 2016). So, the annual rainfall sums are not necessarily better estimated by R C for these two catchments.
Summarizing, the CARROTS factors have a clear annual cycle, with generally higher adjustment factors further away from the radars (Sec. 3.1). This results in estimated annual rainfall sums that are closer to the reference than with the MFB adjusted 195 QPE for regions close to the edges of the radar domain. This effect is expected to become more pronounced when the adjusted QPE products are used for discharge simulations.

Effect on simulated discharges
The severe underestimations of R U have a considerable effect on the discharge simulations for the twelve basins (Fig. 6). This leads to hardly any discharge response and thus negative KGE values for most basins as compared to discharge simulations with 200 the reference rainfall data. The effect is most pronounced for the freely draining catchments in the east and south of the country.
These catchments are more driven by groundwater flow than the polders in the west of the country. Groundwater flow gets hardly replenished, because of similar estimated annual evapotranspiration and R U sums, resulting in too low baseflows. The polders, especially Delfland and Beemster, are an exception to this, because they are less driven by groundwater-fed baseflow and more by direct runoff from greenhouses or upward seepage flows, which makes them more responsive to individual rainfall 205 events leading to higher KGE values (with R U as input) compared to the other basins.
The model runs using R MFB as input significantly improve the simulated discharges, compared to the runs with R U . Nevertheless, the model runs still strongly underestimate the simulated discharges compared to those from the reference runs for the catchments in the south and east of the country (Fig. 6a-f). This is particularly noticeable for the catchments Reusel (KGE = 0.26) and Roggelsebeek (KGE = 0.04). The spatial uniformity of the MFB factors is identified as the cause of these effects, 210 because the MFB method can not correct for the sources of errors leading to the biased QPE in space. This already led to clear underestimations in the annual rainfall sums for these regions (Fig. 5).
The CARROTS QPE outperforms R MFB , when this product is used as input for the twelve rainfall-runoff models. This is not exclusively the case for the six catchments in the east and south of the country (Fig. 6a-f), but also for the other polder and catchment areas. The exception to this is the Beemster polder (which is mostly upward seepage driven), although the difference Only the simulated discharges for 2015 are shown here for clarity; the KGE is based on all years.

Sensitivity analysis
The use of a different moving window size hardly influences the CARROTS factors for moving window sizes of two weeks or longer, but this does not hold for moving window sizes of a day or, to a lesser extent, one week (Fig. 7a). The factor derived with a moving window size of one day fluctuates heavily from day to day. This suggests that the adjustment factor is still quite 220 sensitive to individual events in the 10-year period, when a moving window size of seven days or less is used. In contrast to this, the differences between these six sets of CARROTS factors (Fig. 7a) lead to minimal variations in the simulated discharges for the Aa catchment, when these factors are used to adjust the input QPE (Fig. 7b). Differences in timing and magnitude (0.2-0.3 mm d −1 ) are visible during peaks and recessions, for instance in early April. However, these are small compared to the differences between the model runs with R C and R MFB (Fig. 6). Concluding, a 31-day smoothing of the climatological 225 adjustment factor is warranted.
In addition, leaving individual years out of the ten-year archive has a limited impact on the CARROTS factors (see also the vertical bars in Fig. 4c). Similar to the aforementioned results for the moving window size analysis, it leads to hardly any variations in the simulated discharges for the Aa catchment (not shown here). This suggests that the ten-year archive length was sufficiently long for the factor derivation.

230
In this study, we introduced the CARROTS method to derive adjustment factors that reduce the bias in radar rainfall estimates.
We derived these factors using 10 years of 5-min radar and reference rainfall data for the Netherlands. The method and resulting QPE product generally outperformed the mean field bias (MFB) adjustment that is used operationally in the Netherlands, especially when the QPE products were used as input for hydrological model runs.

235
The main difference with the MFB adjustment is the presence of a high-density network of (manual) rain gauges in the reference dataset, a dataset that is not available in real-time. This allows for spatial adjustments. Overeem et al. (2009b) demonstrate that this reference dataset mostly depends on the daily spatial adjustments from the manual rain gauges, while the higher-frequency MFB adjustment based on the automatic gauges plays a smaller role in the adjustments of this reference product. According to Saltikoff et al. (2019), at least 40 countries have an archive of historical radar data for a period of ten 240 years or more. The proposed CARROTS method is potentially valuable for these countries, especially when the density of their network of automatic rain gauges is, similar to the Netherlands, significantly smaller than the total network of rain gauges.
MFB adjustment of radar rainfall fields is still the most frequently applied adjustment method (Holleman, 2007;Harrison et al., 2009;Thorndahl et al., 2014;Goudenhoofdt and Delobbe, 2016). The results indicate that this choice may be reconsidered, at least for the Netherlands. This could also hold for other regions, especially mountainous regions where the uniformity 245 of the MFB adjustment factor is likely not sufficient to correct for all orography-related errors (Borga et al., 2000;Anagnostou et al., 2010).
However, the proposed CARROTS method has to be recalculated for every change in the radar setup, calibration, additional post-processing steps (e.g. VPR corrections, Hazenberg et al., 2013) or final composite generation algorithm. For instance, including a new radar in the composite would require a recalculation of the adjustment factors, thereby assuming the presence 250 of an archive of the new composite product. This could potentially limit the usefulness of the proposed method.
Although the results are promising, this method is not expected and meant to outperform more advanced spatial QPE adjustment methods, such as geostatistical and Bayesian merging methods (for an overview of methods and their limitations, see Ochoa-Rodriguez et al., 2019). A major advantage of these methods is the real-time derivation of spatial adjustment factors, in contrast to the proposed method in this study, which was solely based on historical data. The MFB adjustment factors can also 255 be derived in near real-time, but are uniform in space, which can explain the worse performance as compared to the proposed method in this study. A disadvantage of geostatistical and Bayesian merging methods is that they are computationally expensive and require the real-time availability of a dense network of rain gauges. Instead, we consider the proposed climatological radar rainfall adjustment method as a benchmark for the development and testing of operational radar QPE adjustment techniques.
In essence, this means that a QPE adjustment method should at least be able to outperform the proposed method in this study. conditions and resulting QPE errors, which could lead to considerable errors during extreme events. Nonetheless, this is also the case for the MFB adjustment technique (Schleiss et al., 2020). The absolute errors for the 10 highest daily sums in this study for the Aa and Hupsel Brook catchments (one of the largest and the smallest catchment in the study) are similar for the MFB and climatological adjustment methods, with on average a 20% difference with the reference (this would have been 50-60 % without corrections). Note that for individual events in these twenty extremes, the errors can still reach 48% for the 270 QPE adjusted with CARROTS and 64% for the MFB adjusted QPE.
Finally, the CARROTS factors were derived with the reference rainfall data for the Netherlands. The same data was used as reference in this study. Although the use of the same data as training and validation set is sub-optimal, leaving out individual years has had a limited impact on the estimated adjustment factors and the resulting QPE and discharge simulations (see also the vertical bars in Fig. 4c).

Conclusions
A known issue of radar quantitative precipitation estimations (QPE) are the significant biases with respect to the true rainfall amounts. For this reason, radar QPE adjustments are needed for operational use in hydro-meteorological (forecasting) models.
Current QPE adjustment methods depend on the timely availability of quality-controlled rain gauge observations from dense networks. This especially applies to methods that correct for the spatial variability in the QPE errors. To overcome this is- The CARROTS factors show clear spatial and temporal patterns, with higher adjustment factors towards the edges of the radar domain. This is caused by larger QPE errors further away from the radars. The factors are also higher from December through March than in other seasons. This is likely a result of sampling above the melting layer during these months, which 290 causes higher underestimations in the unadjusted radar rainfall product.
Although the MFB factors are based on the current over-or underestimations in the QPE, the factor is spatially uniform and does not correct for spatial errors. This directly impacts the adjusted QPE. The MFB adjusted QPE leads to annual rainfall sums that still underestimate those of the reference for the catchments in the east and south of the country (towards the edge of the radar domain). This bias is almost absent for the CARROTS factors (up to 5% over-and underestimation for the same 295 catchments). For basins closer to radars, this effect decreases and both adjustment methods perform well.
The effects of both adjustment methods on the QPE is amplified when they are used as input for the rainfall-runoff models of the twelve studied basins. The discharge simulations with the CARROTS QPE outperforms those using the MFB adjusted QPE for all but one basin. For the Netherlands, these results indicate that the operationally used MFB adjustment performs worse than the proposed climatological adjustment factor for hydrological applications.

300
Despite the aforementioned results, the CARROTS method has two main limitations: (1) for every change in the radar setup, the radar calibration, post-processing algorithms or the final composite generation method, the adjustment factors have to be recalculated; (2) the factor is not calculated for the actual meteorological conditions and resulting QPE errors, which could lead to considerable errors during extreme events. Nonetheless, the latter is also the case for the MFB adjustment technique (Schleiss et al., 2020), even though the MFB factors are derived in real-time.

305
The main advantage of the introduced method is the continuous availability of spatially distributed adjustment factors, due to the independence of timely rain gauge observations. This is beneficial for operational use. In addition, the CARROTS factors are shown to be robust, as the derivation is not found to be sensitive to leaving out individual years or the used moving window, especially when this window is longer than a week.
Finally, this method is not expected and meant to outperform more advanced spatial QPE adjustment methods (which require 310 data from dense rain gauge networks for robust application), but it can serve as a benchmark for the development and testing of more advanced operational radar QPE adjustment techniques. QPE adjustment methods (including CARROTS) greatly benefit from a denser, frequently-available rain gauge network. From that perspective, crowd-sourced personal weather stations hold a promise for improving radar rainfall products, given their direct surface measurements and dense networks (Vos et al., 2019). This also holds for rain gauge observations from other governmental or third parties, e.g. the water authorities in the 315 Netherlands. Hence, we think that this could further improve radar rainfall products in the near future.  Fig. 3 and 4 are described in Crameri (2018) and Crameri et al. (2020), and are available via: https://doi.org/10.5281/zenodo.