Introduction
Erosion is a natural geological phenomenon resulting from the removal and
transportation of soil particles by water, wind, ice, and gravity. Soil erosion by water is a widespread
problem throughout Europe . Erosion is usually
driven by a combination of factors like climate (e.g. long dry periods
followed by heavy rainfall), topography (steep slopes), inappropriate land
use, land cover patterns (e.g. sparse vegetation), ecological disasters
(e.g. forest fires), and soil characteristics (e.g. a thin layer of topsoil,
a silty texture, or low organic matter content).
Although measurements of soil erosion exist, they are often used as a basis
for development, modification, or verification of soil erosion models
(applicable at larger scales), which relate soil loss to indicators of
relevant factors. A classic example is the Universal Soil Loss Equation
(USLE, ) or the Revised USLE . Both methods
express the long-term average annual soil loss as a product of rainfall
erosivity factor (R factor), soil erodibility factor, slope length factor,
slope steepness factor, cover-management factor, and support practice factor.
The experimental data indicate that when factors other than rainfall are held
constant, soil losses from cultivated fields are directly proportional to an
erosivity index (EI30) calculated as the total rainfall kinetic energy times
the maximum 30 min intensity . The R factor is then
obtained as a long-term average annual rainfall erosivity index. In addition
to a point estimate of the R factor, a spatial information (map) is often
required for practical purposes. A number of such maps have been released
recently at national (; ;
; ;
; ;
) or larger scales.
Because of large spatiotemporal variability of rainfall, long records from a
dense network of stations are in general required in order to provide
reliable estimates of the R factor and/or to develop rainfall erosivity
maps. For instance, or recommend at least
22 years of data to be used for estimation of the R factor. On the other
hand, lack of high-resolution rainfall data in combination with a need for
soil erosion risk assessment often leads to situations where the R factor
has to be estimated from shorter records. For instance, a number of stations
used recently for the derivation of rainfall erosivity maps for Europe
were shorter than 20 years and several records shorter
than 10 years were considered. Similarly, the comparison of the spatial
interpolation methods in the Ebro Basin was based
on 10 years of data (for a large number of stations) and
assessed the uncertainty in the estimated R factor considering a 13-year
record for eight stations. In general, the uncertainty in the estimated
R factor increases as the length of the available record decreases, but can
be reduced by combination of data from different sites or
by considering covariates that are better sampled and/or whose variation over
space and time is smaller . For the spatial
interpolation of the R factor, variables like longitude, latitude, and
elevation or long-term
precipitation are often considered.
In addition to spatial and temporal variability, the expressions for the
rainfall kinetic energy (needed for estimation of the erosivity index) and
spatial interpolation are also relevant sources of uncertainty for the
development of an R factor map. The rainfall kinetic energy can be
estimated by a number of expressions see e.g..
Therefore, several authors assessed the effect on the estimated R factor.
For instance, mentioned a variation due to kinetic energy
calculation of about 10 %. Similarly, many spatial models can be used to
predict the R factor values over the area. The differences between several
spatial interpolation methods have been reported e.g. by
. compared the contribution of
different sources of variability to the overall uncertainty in the estimated
basin average R factor in north-eastern Spain, concluding that while the
uncertainty in the annual erosivity index is dominated by the temporal
variability (explaining more than 40 % of variation) for the long-term
R factor, the kinetic energy calculation becomes more important.
Several maps of the R factor for the whole of Europe have been released in
recent decades. For instance, applied simple
relationships between seasonal and annual rainfall and the R factor and
used many high-resolution station datasets (with various
record lengths and observation periods) to derive an R factor map based on
a Gaussian process model. The values derived for the Czech Republic from
these maps range between 350 and 700 MJ ha-1 mm h-1.
A number of estimates of the R factor have been also published specifically
for the Czech Republic. See e.g. the overview by , who also
mention results from a number of national projects. Official guidelines for
soil erosion risk assessment recommend the value
200 MJ ha-1 mm h-1 to be used for all agricultural land in the
Czech Republic. Only recently was this value increased to
400 MJ ha-1 mm h-1 .
These values are relatively small with respect to the
neighbouring countries and some of the
research published for the area. For instance, published
values of the R factor in the range of
430–1060 MJ ha-1 mm h-1 and concluded that considerably larger
values of the R factor (450–600 MJ ha-1 mm h-1) should be
used for practical application of USLE in the Czech Republic (instead of
200 MJ ha-1 mm h-1). Even larger values of the R factor are
reported by . On the other hand, , using
a regression between the daily erosion index (EI30) and daily precipitation
in order to predict annual EI30, report values of the R factor between
150 and 1200 MJ ha-1 mm h-1, with an average for arable land of
300–400 MJ ha-1 mm h-1. Similarly,
derived an R factor for the Czech Republic from daily data considering the
fraction of erosive events in each year for each station and the
areal-average annual sum of the erosivity index. In addition, they excluded
years with the largest and years with the smallest erosivity index from the
analysis. This resulted in a recommendation to use
400 MJ ha-1 mm h-1 for all agricultural land in the Czech
Republic. The trends in annual rainfall erosivity index were studied by
, who found a significant increasing trend
(≈ 4 % per decade) in 51-year records for 11 stations (more than
half of the considered stations). However, recent values of the erosivity
index were not exceptional when compared to a longer (91-year) record
available at a single station. Considerable projected increases in the
rainfall erosivity index in an ensemble of regional climate model simulations
for the Czech Republic were reported by .
The present paper compares several methods of spatial interpolation of the
R factor over the Czech Republic and evaluates the bias and uncertainty due
to expression for the kinetic energy, spatial model, record length, and
spatial coverage. The paper is structured as follows. The study area and data
considered for calculation and spatial interpolation of the R factor are
given in Sect. . Section describes the methods used
for estimation of the R factor, spatial interpolation, and uncertainty
assessment. The results are presented and discussed in Sect. .
The paper is closed with concluding remarks (Sect. ).
The study area with altitudes and locations of 96 stations used for
derivation of the R factor map (left panel) and mean annual precipitation
together with locations of additional stations used for uncertainty
assessment (right panel).
Study area and data
Study area
Because of relatively complex orography and a combination of Atlantic,
Mediterranean, and continental effects, the precipitation patterns over the
area of the Czech Republic are rather variable (see Fig. ).
The precipitation is mostly due to enhanced westerly flows in winter
(Atlantic influence), except for the eastern part of the Czech Republic,
which is typically influenced by the Mediterranean Sea. The Mediterranean
influences are also dominant over the whole area in summer
e.g..
The mean annual total precipitation varies from about 400 mm in the western
part of the Czech Republic up to more than 1400 mm in the mountains to the
north . Almost two-thirds of the annual total falls in the
warm half of the year (April–September). The maximum precipitation amounts
can be quite large at short timescales, thus contributing considerably to the
annual total precipitation. point to historical records of
237 mm in 1 h (25 May 1872), 345 mm in 1 day (29 July 1897), 537 mm in
3 days (6–8 July 1997), and 617 mm in 5 days (4–8 July 1997).
Pluviograph records
A database of 10 min digitized pluviograph records was used in our study.
Since a considerable amount of precipitation falls as snow during winter, the
data are in general available only for mid-May to mid-September (further
referred to as the year). This is, however, the period when heavy
precipitation usually occurs. The data were digitized by the Czech
Hydrometeorological Institute (CHMI). Detailed identification and
reconstruction of unreadable, damaged, or missing records comprised part of
the digitization process seefor details. Consistency of
the aggregated 10 min data with daily records from control ombrometers was
further checked by . Identified inconsistent days were
marked as unreliable and the years with more than 10 % of unreliable data
were excluded from the analysis. A set of 96 stations covering the study area
for the period 1989–2003 was selected (based mainly on data availability) in
order to provide reasonable spatial coverage and record length for the
spatial interpolation of rainfall erosivity. The distance between
neighbouring stations is 5–42 km (17 km on average) and the stations are
located at altitudes from 150 to 1118 m (450 m on average). Note that
although a significant positive trend in the annual erosivity index has been
reported by for several stations in the Czech Republic,
no clear trend is observed (not shown) in the considered period (1989–2003).
Detailed information on the considered stations is included in the Supplement
to this paper.
To analyse the temporal variability and the effect of the number of stations
used for the spatial interpolation, the longest available record (C2TREB01 –
Třeboň), with 80 years of data and an additional 40 stations (usually
with a slightly different time period than 1989–2003), were also considered
(see Sect. ). The location of the stations is indicated in
Fig. .
The database of 10 min digitized pluviograph records was also used for the
estimation of average at-site rainfall event characteristics (event depth,
event duration, maximum 10 min intensity, mean event rate, time to peak,
depth to peak, and number of events), which were later linked with selected
spatial interpolation models seefor details of the considered
rainfall event characteristics.
Gridded precipitation data
Gridded (1 km resolution) statistics of monthly precipitation for the
period 1989–2003 (the same as for the station data) provided by the CHMI
were considered as covariates for the spatial interpolation models.
Specifically, we used the average (May–September)
precipitation (rmea), the coefficient of variation of monthly
(May–September) precipitation (rcv), and the mean fraction of
precipitation above the 95 % quantile of monthly (May–September)
precipitation (rp95). Please note that these gridded statistics are
based on much larger numbers of stations (usually more than 750 stations were
available for each month and year) than were used in our study for spatial
interpolation of the R factor.
Methods
Rainfall erosivity factor
A standard methodology for the assessment of the at-site rainfall erosivity
factor was applied. A continuous 6 h interval without
precipitation is used to separate individual rainfall events. To be
considered erosive, the cumulative rainfall of an event should be greater
than 12.7 mm, or the event should have at least one peak greater than
6.35 mm in 15 min. The latter criterion was modified to 8.5 mm in 20 min
in order to match the temporal resolution of our dataset. A similar
modification was reported by . The proportion of
erosive events that are included on the basis of the intensity criterion is,
however, small.
The rainfall erosivity factor is defined as a long-term average annual sum of
the event erosivity index (EI30
(MJ mm ha-1 h-1 yr-1), ),
EI30=I30∑ieivi,
where
ei=28.31-0.52exp-0.042ri
is the unit rainfall energy (MJ ha-1 mm-1) ,
vi and ri are the rainfall volume (mm) and intensity (mm h-1)
during a time interval i, respectively, and I30 is the maximum
rainfall intensity during a period of 30 min in the event (mm h-1).
The unit rainfall energy in Eq. () is from , who
assessed many expressions for its calculation. To assess the related
uncertainty, the R factor was also estimated using additional
14 expressions for ei (see Appendix for their definition).
Spatial interpolation models
An at-site dataset of rainfall erosivity factors was further explored using
spatial interpolation models. Following the classification of spatial
interpolation models for rainfall erosivity provided by
, selected local, geostatistical, mixed, and global
models were tested.
The local models, which described the spatial distribution of rainfall
erosivity using local at-site information of the R factor, were represented
by the model based on inverse distance weighting (IDW; see
e.g. ; ). The IDW
model predicts the R factor as a weighted average of R factor estimates
from a selected number of stations. Weights are inversely proportional to the
distance (between the interpolated location and the corresponding station)
raised to the power r. The number of considered stations and the value of
parameter r were estimated during the model cross-validation (see
Sects. and ).
The geostatistical models were represented by simple kriging (SK), ordinary
kriging (OK), simple cookriging (SC), and ordinary cokriging (OC) models
. The arithmetic mean
of at-site R factor values was used as the estimator of mean values for
simple kriging and cokriging models. The cokriging models were tested using a
set of seven rainfall event characteristics (see Sect. )
as a cokriging variate. Their spatial covariance structures were described by
the Matérn model .
The mixed models were represented using a set of different regression kriging
models, further denoted as RK . Their spatial
covariance structures of residuals were described using the Matérn model.
The spatially varied means of R factors were estimated using three types of inputs:
location information represented by longitude (x), latitude (y), and altitude (z);
spatial rainfall information expressed by the combinations of rmea, rcv, and rp95; and
information consisting of combinations of both previous types of spatial covariates.
The global spatial interpolation models were represented by the set of
generalized linear models (GLS). The deterministic components were estimated
using the same types of inputs as in the case of mixed models. Their random
error terms were represented using the Matérn and exponential models; the
heteroscedasticity of the GLS model residuals was studied by means of the
exponential variance function see p. 206 in. The
parameters of all tested GLS models were estimated with the restricted
maximum likelihood (REML) method .
Model selection
A standard leave-one-out procedure was used for the validation of spatial interpolation
models. The following indices were considered:
Willmott's agreement index (WI)WI=1-∑i=1nRSxi-RIxi2∑i=1n|RSxi-R‾Sxi|+|RIxi-R‾Ixi|2,
Mean absolute error (MAE)MAE=1n∑i=1n|RSxi-RIxi|,
Relative mean absolute error (rMAE)rMAE=1n∑i=1n|RSxi-RIxi|RSxi,
Root mean square error (RMSE)RMSE=1n∑i=1nRSxi-RIxi2,
where RS(xi) is the rainfall erosivity calculated for
station i, RI(xi) is the spatially interpolated value of
the rainfall erosivity for the same station, and
R‾S(xi) and R‾I(xi) are
the mean rainfall erosivities calculated for the station data and obtained
from spatial interpolation, respectively.
The choice of the validation criteria follows the discussion about RMSE and
MAE presented by , who recommended the use of MAE for the
estimation of average model error over the RMSE, since the RMSE can be
influenced by outlying observations. The WI was also considered for
consistency with other relevant studies for example,
see.
The model selection aimed at identification of robust spatial interpolation
methods from the considered groups and also at identification of optimal
structures and parameters of the particular models, including various
combinations of covariates, covariance models, and the number of nearest
stations considered for local, geostatistical, and mixed models.
Uncertainty assessment
The uncertainty in the derived R factor map originates from the formulation
of the kinetic energy term, the spatial interpolation model, and spatial and
temporal variability. While the effect of the former two can be assessed by
direct comparison of results for different expressions of rainfall kinetic
energy and different spatial interpolation models, the effect of spatial and
temporal variability is evaluated here by simple bootstrap resampling
procedures, which are briefly summarized in the rest of this section.
Temporal variability
Record length influences the width of the confidence interval around the
R factor estimate. Due to the temporal variability, sufficient record
length is required in order to provide an estimate of the R factor such
that the long-term average R factor (further denoted the “true
R factor”) would be covered by the estimated confidence interval.
Therefore, we derived the confidence intervals for various record lengths
together with the probability that the “true R factor” will lie within
the corresponding confidence interval (further denoted the “coverage
probability”) using our longest available record, i.e. station C2TREB01
(Třeboň) with 80 years of data. This can be done using a nested
bootstrap procedure in which a sample of required length (e.g. 10, 20, or
80 years) is drawn (with replacement) from the original record and further
resampled. This allows for determination of the confidence interval and
examination whether the “true R factor” is included. To obtain the
coverage probability, the whole procedure has to be repeated many times. Here
we evaluated the coverage probabilities for the record lengths of
10–80 years. For details, see Appendix .
Record length and spatial coverage
Long records from a dense network of stations should be ideally available to
derive an R factor map. In reality, however, long records are often
available only for a relatively small number of stations and a balance
between record length and spatial coverage has to be found. It is then not
clear whether longer records or better spatial coverage should be preferred.
Specifically, we considered (a) what the
relationship of the error in the estimated R factor with the spatial and
temporal coverage is and how spatial and temporal coverage influences it,
(b) the width of the confidence interval around the estimated R factor, and
(c) the coverage probability (i.e. the probability that the estimated
confidence interval will include the “true R factor”).
To be able to assess these questions, a simulation study was conducted. The
procedure is fully described in Appendix , and here we
provide only a general overview. As a reference, a synthetic dataset of the
monthly (May–September) erosivity index (EI30) was created by permutation of
10 years of data available for 120 stations as follows: first, a 100-year
long sequence of the months May–September was created and a random year
(from the available period) was assigned to each month in each year. Data for
each of the 120 stations were then rearranged according to this year–month
sequence and the data were aggregated by years. This resulted in a dataset of
100 years for 120 stations. This procedure preserves the annual cycle of the
erosivity index and its spatial variability, while it assumes independence of
the erosivity index between individual months. This dataset is further
denoted the “full dataset”. Please note that although many different
replications of this “full dataset” can be obtained, only one replication
is used in this study. We do not expect the results (presented further) to
vary significantly when different replication is considered.
The R factor in this “full dataset” was estimated and a simple GLS model
of the form R ∼ NAVY + rmea + Y was fitted. This model was
used to predict the R factor for an additional 62 locations (coincident
with real stations, but independent of the “full dataset”). This is further
denoted the “validation dataset”. The assessment of the effects of spatial
and temporal coverage was based on a repetitive resampling of the subsets of
the “full dataset”, fitting a GLS model and predicting the R factor for
the “validation dataset” (for details, refer to Appendix ).
Within the simulation study, we evaluated the RMSE, the width of the 90 %
confidence interval, and the coverage probability for record lengths of 5,
10, 15, 20, 30, …, 100 years for 10, 20, …, 120 stations. Since it
has been frequently noted that areal averaging is in general preferred to
spatial interpolation in the Czech Republic
, at least for agricultural
areas (in general at low altitudes), we also assessed this “areal-average
model” (i.e. a constant R factor for all locations). The “areal-average
model” was applied for the whole “validation dataset” and also considering only
the stations with altitudes below 600 m.
Comparison of uncertainties
To compare the contribution of different sources of uncertainty, the
R factor was predicted for the “validation data” using
the best GLS model of the R factor considering 15 different expressions
for kinetic energy (see Appendix );
selected classes of spatial interpolation models;
GLS models based on different numbers of stations and years (see Sect. ).
The coefficient of variation (CV) for each set of the estimated R factors was
calculated to summarize the variability due to different sources, similarly
as done by or . For comparison, we also
evaluated the CV for the R factor estimates, considering various record
lengths for the C2TREB01 (Třeboň) station (see
Sect. ). Finally, the CV for the R factor estimate for
each of the 96 stations (representing natural variability) was evaluated
using a simple bootstrap resampling of the annual erosivity values.
Results and discussion
Average R factor for subsets of stations with smaller elevations
than that plotted on the horizontal axis. The leftmost point corresponds to a
station with the lowest altitude, the rightmost point to the overall average
R factor.
R factor
The estimated R factor (considering the kinetic energy relationship
proposed by , Eq. ) for the 96 stations used
for spatial interpolation ranges between 320 (U1KOPI01 – Kopisty,
north-western Czech Republic) and 1520 (O1RASK01 – Raškovice,
north-eastern Czech Republic) MJ ha-1 mm h-1, and averages
640 MJ ha-1 mm h-1. Note that the second-largest R factor
(station H2DEST01 – Deštné v Orlických horách) equals
1108 MJ ha-1 mm h-1. R factor values for individual stations
can be found in the Supplement to this paper. Figure shows
average R factor values for subsets of stations based on the maximum
station elevation included in the subset. For instance, the average
R factor for the elevations up to 300 m is slightly less than
550 MJ ha-1 mm h-1 and for elevations up to 600 m slightly
more than 600 MJ ha-1 mm h-1. The average contribution of
individual months to the annual total R factor is 17 % for May, 19 %
for June, 28 % for July, 26 % for August, and 10 % for September.
Cross-validation indices for the best variants of each spatial
interpolation model. AVst – at-site average; SDst – at-site standard
deviation; AVsp – spatial average; SDsp – standard deviation of the spatial
R factor. The best cross-validation indices are marked in bold
font.
WI
RMSE
MAE
rMAE
AVst
SDst
AVsp
SDsp
IDW
0.78
164.33
123.68
0.20
625.69
162.15
649.46
165.37
SK
0.79
163.01
124.78
0.21
626.98
163.73
647.33
174.49
OK
0.79
165.96
126.48
0.21
630.72
174.62
656.59
135.70
SC
0.73
169.72
131.15
0.22
631.26
139.77
644.53
151.57
OC
0.79
158.78
121.86
0.21
632.86
158.90
656.32
138.13
RKXYZ
0.79
171.44
131.34
0.22
636.86
191.49
653.69
187.78
RKrmea,rcv
0.89
126.89
93.59
0.15
636.87
185.67
626.22
168.13
RKxyz,rmea,rcv
0.90
121.91
93.13
0.15
636.18
183.74
632.77
158.78
GLSM
0.90
115.38
90.57
0.15
635.34
175.12
628.27
149.29
GLSE
0.91
114.37
89.95
0.14
635.64
177.47
627.80
153.35
The at-site R factor estimates correspond well to those published by
and , but are considerably larger than
values recommended by the official guidelines for the Czech Republic
and those published by
. In part, differences might be due to the
modifications to the standard USLE methodology considered by
, e.g. calculation of the R factor as a
trimmed mean (excluding years with the two smallest and two largest annual
erosivity values) of the annual erosivity index. Perhaps in small measure,
the differences can also be
attributed to the different time period used for R factor assessment
(here 1989–2003; in other studies, the series from 1960 and earlier were
often considered). The estimated average R factor, after reduction for
temporal resolution (from 10 to 30 min; ),
becomes 525 MJ ha-1 mm h-1, which almost equals the average
R factor from the rainfall erosivity map of Europe –
524 MJ ha-1 mm h-1 for the Czech Republic .
In addition to different temporal resolutions, also
considered different stations and kinetic energy formulation (i.e. that of
; see Fig. for a comparison with
, used here). Although the spatial distribution of the
R factor over the Czech Republic appears rather homogeneous in
, the range of the at-site R factors (after correction
for temporal resolution) corresponds well to that from
when the station with the maximum rainfall erosivity (O1RASK01) is left out.
The effect of this station on the resulting R factor map is further
discussed in the following section.
Model selection
The estimated at-site R factor is positively correlated with average
precipitation (rmea), coefficient of variation of monthly
precipitation (rcv), and the mean fraction of precipitation above
the 95 % quantile of monthly precipitation (rp95) derived from
the gridded data (see Sect. ), with correlation
coefficients 0.75, 0.44, and 0.54, respectively. Further, a positive
correlation was also found with altitude (0.32) and longitude (0.47), and a
weak negative correlation (-0.25) with latitude. These variables have been
primarily used also as covariates in the relevant spatial interpolation
methods. Note that correlation may be substantially weaker or that different
variables may be relevant in other regions, especially regions with complex
topography (see e.g. or ).
To explore these relationships further, we analysed a set of GLS models with
various combinations of fixed component covariates and spatial stochastic
covariance structures. The number of fixed-term covariates ranged from one to
five. Two theoretical models of spatial covariance and a heteroscedastic
error model were considered (see Sect. ). The best GLS model
according to the cross-validation results of WI, RMSE, and rMAE had
four
fixed-term covariates: rmea, rp95, altitude, and
longitude, exponential spatial correlation structure, and exponential
heteroscedastic error. The coefficient of determination for regression
between at-site R factor values and R factor values predicted by this GLS
model equals 0.7, which is comparable to results of
. This was followed by a GLS model with
fixed-term covariates longitude, latitude, altitude, rmea, and
rcv and a Mathérn spatial covariance structure, which had the
lowest value of MAE. These two GLS models are further referred to as
GLSE and GLSM, respectively. The GLSE model
with estimated coefficients reads as
R=10.246rmea+2902.462rp95-y-0.130z+3649.792,
with y the latitude (km) (in the Czech S-42–Pulkovo 1942/Gauss–Kruger
zone 3 projection) and z the altitude (m).
The parameters are transformed such that the equation can be directly applied
to observed covariates. Please note that rmea and rp95
are based on (1 km) gridded data (see Sect. ). Calculating
these indices using station data would likely lead to a positive bias in the
resulting R factor. This is due to area reduction effects influencing
areal-average rainfall event characteristics see e.g.for a
discussion.
R factor estimated with the GLSE model using all
available stations (top left panel) and the GLSE model fitted
considering only stations with an R factor below
1500 MJ ha-1 mm h-1 (top right panel),
1100 MJ ha-1 mm h-1 (bottom left panel), and
1000 MJ ha-1 mm h-1 (bottom right panel). Blue points in the
top left panel indicate the full set of 96 stations; red dots in the other
panels correspond to stations excluded from the fitting
procedure.
Differences (MJ ha-1 mm h-1) between the GLSE
and other spatial interpolation models. GLSE – generalized linear
model with exponential covariance structure; GLSM – generalized
linear model with Matérn covariance structure; IDW – inverse distance
weighting; SK – simple kriging; OK – ordinary kriging; SC – simple
cokriging; OC – ordinary cokriging; RK – regression
kriging.
Figure further demonstrates the effect of excluding
station(s) with a large R factor on the interpolated R factor in the case
of the GLSE model. Although the mean and maximum R factors
decrease and local R factor patterns change slightly when stations with a
large R factor are excluded, the differences are small (especially with
respect to other sources of uncertainty; see further), suggesting our model
is rather robust.
In addition to GLS models, other spatial interpolation methods were
considered. Table presents the cross-validation indices,
spatial average, and standard deviation of the interpolated R factor. The
estimated average R factor ranges from 626 to
657 MJ ha-1 mm h-1. Comparing the R factor estimated by
different spatial interpolation models, the largest similarities were found
between the GLSE, GLSM, and
RKxyz,rmea,rcv. The GLSM and
RKxyz,rmea,rcv models had fixed-term inputs formed from
longitude, latitude, altitude, rmea, and rcv. The
correlation coefficient between estimates of GLSE and
GLSM equals 0.99, and between the GLSE model and
RKxyz,rmea,rcv models equals 0.97.
These similarities were confirmed on the rasters of differences between
values of the GLSE model and the remaining spatial interpolation
models (see Fig. ). The median absolute difference between the
GLSE and GLSM models was
20.0 MJ ha-1 mm h-1, with the standard deviation of differences
19.2 MJ ha-1 mm h-1. For the differences between the
GLSE and the best RK model, the median absolute difference and
standard deviation were roughly double.
The largest differences were found between the GLSE model and the
spatial interpolation models, which did not take into account the long-term
rainfall characteristics. The largest difference in R factor estimates
(741 MJ ha-1 mm h-1) was found between the GLSE and
OK models. Large values of MAE, rMAE, and RMSE and small values of WI for the
IDW, SK, OK, SC, and OC models show that the spatial distribution of the
R factor could not be sufficiently described by the models, which
emphasized the stochastic component of the R factor, or explain the
R factor using local information. The spatial interpolation models with
fixed-term covariates based on rmea or rcv or
rp95 were superior to the models without a fixed component linked
to the long-term rainfall characteristics (cf. the cross-validation results
of RKXYZ).
Including the stochastic information obtained from rainfall event
characteristics in simple cokriging and ordinary cokriging models also did
not improve the spatial interpolation of the R factor. The presented SC and
OC models were selected from seven different types of cokriging models. They
differed according to the rainfall event characteristic, which was used for
the second cokriging variate. The best SC and OC models were those which
linked their stochastic component with the maximum 10 min intensity and
on-site R factor. Including these rainfall event characteristics did not,
however, improve the spatial interpolation of the R factor over the spatial
models based on the long-term rainfall event characteristics (see
Table ).
The success of models including long-term precipitation characteristics might
be surprising in part because daily or subdaily rainfall data are in general
preferred for calculation of the R factor over monthly or annual data
. The potential of long-term rainfall
characteristics for R factor estimation is also stressed by
, who explored relationships between rainfall and erosivity
indices at daily, monthly, and annual timescales and concluded that the
relationship between the annual erosivity index and annual rainfall is closer
than that for the other timescales.
The average R factor varies considerably when different formulas for
kinetic energy are considered (see Fig. ). The smallest
average R factor is obtained by the RWa relationship
(500 MJ ha-1 mm h-1) and the largest value by the standard USLE
(790 MJ ha-1 mm h-1). The average
(640 MJ ha-1 mm h-1) corresponds well to the value estimated
using the van Dijk formula (the difference is
6 MJ ha-1 mm h-1). The range between estimates for individual
stations is proportional to the estimated R factor (corresponding roughly
to 40 %). This also has an effect on the estimated spatial distribution of
the R factor values.
Temporal variability
used a 22-year record to derive the rainfall erosivity factor
because of “apparent cyclical patterns in rainfall data”. The same is
repeated by with a remark that longer records are
advisable especially in the case that the coefficient of variation of annual
precipitation is large. This recommendation is often mentioned in rainfall
erosivity studies. However, due to data availability, shorter records are
often considered, at least in addition to longer records
e.g.. For a 105-year record
from Belgium, tested whether rainfall erosivity
derived from running 10- and 22-year averages is significantly different than
that from using the overall (105-year) mean. They concluded that while a
22-year period is sufficient, reliable estimates of the R factor cannot be
based on 10 years of data. Apart from their study, the actual effect of the
sample size on the estimate of the R factor was seldom investigated.
Estimated R factor (MJ ha-1 mm h-1) considering
different kinetic energy formulations (see
Appendix ).
The average confidence intervals (relative to the long-term mean
R factor, 669 MJ ha-1 mm h-1; grey area) and the coverage probability (thick lines) for
different record lengths based on station C2TREB01 – Třeboň (left
panel) and simulated data (right panel). The dotted line corresponds to the
coverage probability of 90 %.
Figure (left panel) gives the estimated confidence intervals
(grey area) together with the coverage probability, i.e. the probability that
the long-term mean R factor (669 MJ ha-1 mm h-1) will lie
within the confidence intervals for record lengths between 10 and 80 years
for the Třeboň station (C2TREB01). The 90 % confidence interval for
record length 10 years ranges from 400 to 950 MJ ha-1 mm h-1
(i.e. ±40 %), narrows to 500–870 MJ ha-1 mm h-1
(±25–30 %) for 20 years and 530–830 MJ ha-1 mm h-1
(±20–25 %) for 30 years, and remains relatively wide,
i.e. 580–770 MJ ha-1 mm h-1 (±15 %), even for an 80-year
record. Note that the temporal variability of the erosivity index is
considerably larger than in the case of the annual total or maxima. For
instance, the standard deviation of the annual erosivity index is more than
double that of the annual precipitation total and more than 40 % larger
than for annual 1 h precipitation maxima (estimated for the same station).
The coverage probability (red line in Fig. , left panel) of the
90 % confidence interval is around 75 % for the 10-year record, and from
82 % for the 15-year record it increases only slowly, with an increasing
record length of up to 87 % for the 80-year record.
The width of the confidence interval as well as the coverage probability are
influenced by the variability of the at-site erosivity index. This is
demonstrated in Fig. (right panel) for synthetic data generated
from a Gamma distribution with parameters estimated from the C2TREB01 record
(denoted ΔCV = 1) and with parameters modified such that the
coefficient of variation of the modified distribution is half and double that
of the C2TREB01 record (ΔCV = 0.5 and ΔCV = 2,
respectively), and the mean R factor remains constant. Note that the Gamma
distribution was not rejected by the Anderson–Darling test at the
0.05 significance level at most of the stations, including C2TREB01. It is
clear that the confidence intervals as well as the coverage probability for
the erosivity index simulated with parameters estimated from C2TREB01
(i.e. ΔCV = 1) correspond reasonably to those based on observed
data. It is also clear (and expected) that the width of the confidence
interval increases with the coefficient of variation. For instance, for a
15-year record and doubling of the coefficient of variation, it ranges
from 0.5 to 1.57. For increasing record lengths the coverage probability
increases and the width of the confidence interval decreases. Note that the
confidence interval for the erosivity index with a large coefficient of
variation remains relatively large (> 50 %), even for 80 years of data.
The coverage probability, on the other hand, decreases only slightly with the
coefficient of variation.
The RMSE, width of the confidence interval, the coverage probability
for a GLS model (solid lines), the “areal-average” model considering all
stations (dotted lines), and only those below 600 m (dashed lines) based on
simulated data with various record lengths and number of
stations.
Spatial and temporal coverage
Figure shows the average RMSE, coverage probability, and
relative width of the 90 % confidence interval for the validation data in
the cases of the GLS model (solid line), “areal-average” model (dotted
line), and “areal-average” model considering only stations with an altitude
of less than 600 m (dashed line). In the case of the GLS model, the RMSE for
small numbers of stations and years is relatively large
(≈ 300 MJ ha-1 mm h-1). However, it quickly drops to
≈ 100 MJ ha-1 mm h-1 for 25 stations and then
decreases almost linearly to ≈ 50 MJ ha-1 mm h-1 for
100 stations. The RMSE for both “areal-average” models depends on the
number of stations only up to 25 stations and remains almost constant for
larger numbers of stations (≈ 110 MJ ha-1 mm h-1 when
validated against all stations and
≈ 100 MJ ha-1 mm h-1 when only stations below 600 m
are used). The RMSE depends only slightly on the number of years used for
estimation of the R factor in the case of all models (GLS and both
“areal-average” models). With respect to RMSE, the “areal-average” model
is beneficial only when fewer than 20 stations are available. The RMSE for
individual stations might be considerably larger. The 90 % quantile RMSE
(not shown) is around 600 MJ ha-1 mm h-1 for both
“areal-average” models (mostly independently of the number of stations) and
from 2000 MJ ha-1 mm h-1 (10 stations) to
450 MJ ha-1 mm h-1 (30 stations) and
250 MJ ha-1 mm h-1 (100 stations) for the GLS model with
15 years of data.
The width of the 90 % confidence interval (Fig. ,
middle panel) for the GLS model drops from 120 % (5 stations) to 50 %
(25 stations) and 30 % (100 stations). This value corresponds well to the
width of the confidence interval for the full record of the
C2TREB01 (Třeboň) station (see Fig. ). The confidence
interval for both “areal-average” models is approximately half that for the
GLS model. The impact of the record length is small.
The coverage probability (Fig. , right panel) for the GLS
model increases from 70 to 90 % for 5 and 10 stations, respectively. The
coverage probability further increases only when more than 20 years of data
are used. For shorter records, the coverage probability decreases to
≈ 80 % for 100 stations. This is a consequence of faster reduction
of the confidence interval width when compared to the decrease in the RMSE.
Similarly, the coverage probability decreases for both “areal-average”
models, since the RMSE is almost constant for more than 25 stations, while
the width of the confidence interval decreases. As with the RMSE, the
coverage probability might be considerably lower for individual stations,
especially in the case of both “areal-average” models (for a number of
stations close to 0), while the coverage probability is larger than 70 %
for most of the stations in the case of the GLS model.
The results for all evaluated characteristics indicate that appropriate
spatial coverage is more important than the length of the record, at least in
situations when other relevant information to build a spatial model is
available. However, at least 15–20 years of data should be considered (if
possible) to provide reasonable coverage probabilities.
Comparison of different sources of uncertainty
Using the “validation data” and the GLS model, we calculated the
coefficient of variation (CV) associated with formulation of the kinetic
energy, spatial interpolation, and spatial and temporal coverage
(Table ). In addition, the CV was also calculated for estimates
of the R factor based on different record lengths for the C2TREB01
(Třeboň) station and for the set of 96 stations considered for
spatial interpolation. For the latter, the estimated CV was 23 % on average
(9–43 % for all stations). This value can be interpreted as an indicator
of natural variability of the R factor based on a 15-year record. Almost
the same value (21 %) is estimated for the C2TREB01 (Třeboň)
station and 15 years of data. The contribution of the kinetic energy
formulation (≈ 13 %) and spatial interpolation (≈ 9 %)
is about half of this value. As expected, the CV of the estimates for the
C2TREB01 (Třeboň) station decreases with increasing record length
(38, 16, and 12 % for 5, 50, and 80 years, respectively).
The coefficient of variation (CV) for the R factor estimated by
considering different formulas for kinetic energy, spatial interpolation
methods, record length, and spatial coverage. In the second column, the
average CV for all considered stations is given; the last two columns
indicate the range of the CV from all stations in the “validation
dataset”.
Source
Coefficient of
variation (%)
Kinetic energy
12.7
(11.8–14.4)
Spatial interpolation
9.4
(1.5–29.7)
Number of stations (GLS model, 15 years)
10 stations
31.0
(19.6–69.1)
30 stations
10.3
(7.6–21.4)
50 stations
7.6
(6.0–13.9)
Number of years (GLS model, 100 stations)
5 years
13.0
(10.9–22.3)
15 years
6.4
(4.9–11.8)
50 years
4.9
(3.4–9.5)
Number of years (C2TREB01 – Třeboň)
5 years
37.6
–
15 years
21.6
–
50 years
16.1
–
80years
11.8
–
Natural variability (96 stations, 15 years)
23.3
(9.1–43.0)
The same applies for the GLS model, for which, in addition, the CV also
decreases with an increasing number of stations. When comparing corresponding
record lengths, the CV is considerably smaller for the GLS model than for
individual stations, providing a sufficient number of stations is considered
in the model. For instance, for the R factor based on 15 years of data, the
average CV is 31, 10, 7, and 6.4 % for 10, 30, 50, and 100 stations,
respectively, while for the station data and the same record length the
average CV was 23 %. In addition, considering 100 stations, the CV is only
13 (5) % for 5 (50) years. From Table it is evident that not
only does the average CV decrease, but that the same holds also for the
stations with maximum variation. For instance, using 100 stations and
15 years, the maximum CV in the “validation set” is 12 % for a GLS model
and 43 % for the station data. As the CV decreases with more stations and
longer records, the relative importance of the expression for the kinetic
energy and spatial interpolation increases. The assessment of the sources of
uncertainty could be done more formally using an analysis of variance (ANOVA)
model e.g. or a slightly more flexible linear
mixed-effects model see e.g..
Summary and conclusions
In the present paper we estimated the rainfall erosivity factor (R factor)
for the area of the Czech Republic. The at-site values of the R factor
based on a 15-year record for 96 stations were considered in several spatial
models in order to provide estimates of the R factor for the whole area of
the Czech Republic.
The spatial interpolation models included inverse distance weighting, simple
and ordinary kriging, simple and ordinary cokriging, regression kriging with
parameters estimated by the method of moments, and the GLS models. Several
covariates have been considered to explain the spatial variation of the
R factor over the area.
In addition, uncertainty due to kinetic energy formulation, spatial models,
and spatial and temporal coverage was assessed by direct comparison of
different methods and simulation studies.
The most important findings can be summarized as follows:
the average R factor in the period 1989–2003 for the considered stations
is 640 MJ ha-1 mm h-1, with values for the individual stations
between 320 and 1520 MJ ha-1 mm h-1;
the at-site R factor is considerably correlated with average precipitation,
coefficient of variation of monthly precipitation, the mean fraction of
precipitation above the 95 % quantile of monthly precipitation, and
longitude, while the correlation with altitude and latitude is weak;
from the considered spatial models, a GLS model with altitude, latitude, mean
precipitation, and the mean fraction of precipitation above the 95 %
quantile of monthly precipitation provided the best performance according to
three of four cross-validation indices;
with respect to the cross-validation statistics, the spatial interpolation
models that included long-term rainfall characteristics performed
considerably better than those based on local interpolation and/or
geographical information only;
the resulting map based on the GLS model does not change considerably when
stations with the largest R factor are excluded;
when the number of stations and years available for interpolation is small,
the relative contribution of the uncertainty due to the kinetic energy
estimate and the spatial interpolation method is small compared to that due
to the choice of the stations and time period;
although the RMSE and confidence interval width decrease and coverage
probability in general increases with record length and number of stations,
reasonable estimates of the R factor may be obtained from relatively short
records (e.g. 15–20 years) provided a sufficient number of stations are
available and appropriate covariates can be found;
the confidence intervals around the R factor estimates remain relatively
wide even for long records, especially at locations with large natural
variability of the annual erosivity index, and are considerably wider than
those for annual rainfall total or precipitation maxima;
the spatial model should in general be preferred over the areal-averaging of
R factor estimates from individual stations unless only very short records
for a small number of stations are available or appropriate covariates cannot
be used.