Flood-initiating catchment conditions: a spatio-temporal analysis of large-scale soil moisture patterns in the Elbe River basin

. Floods are the result of a complex interaction between meteorological event characteristics and pre-event catchment conditions. While the large-scale meteorological conditions have been classiﬁed and successfully linked to ﬂoods, this is lacking for the large-scale pre-event catchment conditions. Therefore, we propose classifying soil moisture as a key variable of pre-event catchment conditions and investigating the link between soil moisture patterns and ﬂood occurrence in the Elbe River basin. Soil moisture is simulated using a semi-distributed conceptual rainfall-runoff model over the period 1951–2003. Principal component analysis (PCA) and cluster analysis are applied successively to identify days of similar soil moisture patterns. The results show that PCA considerably reduced the dimensionality of the soil moisture data. The ﬁrst principal component (PC) explains 75.71 % of the soil moisture variability and represents the large-scale seasonal wetting and drying. The successive PCs express spatially heterogeneous catchment processes. By clustering the leading PCs, we identify large-scale soil moisture patterns which frequently occur before the on-set of ﬂoods. In winter, ﬂoods are initiated by overall high soil moisture content, whereas in summer the ﬂood-initiating soil moisture patterns are diverse and less stable in time.


Introduction
Flood generation and magnitude are the result of a complex interaction between meteorological conditions, such as the amount and spatial distribution of precipitation or the inflow of warm air masses, and pre-event hydrological catchment conditions, such as soil saturation and snow water equivalent (Merz andBlöschl, 2008, 2009;Brocca et al., 2008;Parajka et al., 2010;Marchi et al., 2010).In order to capture the variety of large-scale flood generation mechanisms, flood events have been classified and analyzed according to their hydrometeorological conditions along with their interactions between catchment state and meteorological conditions (e.g.Alila and Mtiraoui, 2002;Apipattanavis et al., 2010;Merz and Blöschl, 2003).These interactions vary from decade to decade (Alila and Mtiraoui, 2002), seasonally (Sivapalan et al., 2005;Merz and Blöschl, 2003;Parajka et al., 2010), and from event to event as well as from catchment to catchment (Merz and Blöschl, 2003).In addition to the occurrence and interaction of the hydro-meteorological conditions, their spatial patterns related to flooding need to be taken into account (Merz and Blöschl, 2003), which can be especially important in larger catchments (Merz and Blöschl, 2008).On the regional scale, the automated classification of meteorological conditions has already identified a close relationship between the occurrence and persistence of meteorological circulation pattern types and floods (e.g.Bárdossy and Filiz, 2005;Jacobeit et al., 2006;Petrow et al., 2009;Prudhomme and Genevier, 2011;Parajka et al., 2010).
As far as hydrological catchment conditions are concerned, several studies identified soil moisture pattern types on the local or regional scale applying an automated classification (Kim and Barros, 2002;Jawson and Niemann, 2007;Korres et al., 2010;Ibrahim and Huggins, 2011;Perry and Niemann, 2007;Wittrock and Ripley, 1999).However, the analyzed soil moisture data (remotely sensed or groundbased point measurements) are either limited in their spatial extent covering a small (< 1 km 2 ) study area (e.g.Perry and Niemann, 2007) and/or in their temporal resolution (monthly/annual values or a small number of subsequent days) (e.g.Jawson and Niemann, 2007;Wittrock and Ripley, 1999).No studies are available that attempt to automatically classify the patterns of regional hydrological catchment conditions and to link them to flood initiation.
Complementary to the classification of meteorological conditions, we therefore propose the classification of the hydrological catchment conditions at the regional (Elbe) scale to get a probabilistic insight into the link between flood initiation and the hydrological catchment conditions.As soil moisture is a key variable of hydrological catchment conditions, we examine whether flood initiation in the Elbe River basin can be linked to specific soil moisture pattern types.
For the estimation of the hydrological catchment conditions concerning soil moisture, ground-based soil moisture measurements (e.g.time domain reflectometry, frequency domain reflectometry, gravimetric), remotely sensed soil moisture measurements (Brocca et al., 2009), modelbased soil moisture (Norbiato et al., 2009;Merz and Blöschl, 2003) and surrogate measures such as mean annual precipitation (Merz and Blöschl, 2009;Merz et al., 2006), antecedent precipitation index (Merz et al., 2006;Brocca et al., 2009), Gradex method (Merz and Blöschl, 2008), and event runoff coefficient (Merz et al., 2006;Sivapalan et al., 2005;Merz andBlöschl, 2003, 2009) have been applied.In the present paper, the link between the hydrological catchment conditions and flood initiation in the Elbe Basin is investigated by using daily profile soil moisture simulated by a rainfall-runoff model and validated against remotely sensed soil moisture.It is assumed that the implemented rainfallrunoff model incorporates the hydrological processes that enable the estimation of soil moisture.Afterwards, a principal component analysis (PCA) and a subsequent clustering of the leading principal components (PCs) yield different pattern types.PCA is by far the most commonly applied method among the automated techniques to classify the structure of spatially variable data, and has also been applied in soil moisture pattern studies (Kim and Barros, 2002;Jawson and Niemann, 2007;Korres et al., 2010;Ibrahim and Huggins, 2011;Perry and Niemann, 2007;Wittrock and Ripley, 1999).In parallel, regional flood events are identified and linked to the derived pattern types.
The remainder of this paper is organized as follows: first, the study area and input data are described in Sect. 2. The methods to identify distinct types of daily soil moisture patterns and flood events are provided in Sect.3. Section 4 describes the retrieved soil moisture pattern types, their characteristics as well as their relation to flood initiation.These results are discussed in the subsequent section.Section 6 concludes our findings and suggestions for future research.
2 Study area and data

Study area
The Elbe/Labe River (Fig. 1) originates in the Giant Mountains 1386 m a.s.l. in the Czech Republic, crosses northeastern Germany and reaches the North Sea after 1094 km.The Czech Republic and Germany are the main riparian states of the 148 268 km 2 large drainage basin.Negligible parts belong to Austria and Poland.About 50 % of the Elbe drainage basin has an elevation below 200 m a.s.l.One-third is hilly country with an elevation between 200 and 500 m a.s.l.The low mountain range (500-750 m a.s.l.) accounts for 15 % and the mountain range for less than 2 %.Major tributaries are the Moldau/Vltava contributing an average of 154 m 3 s −1 of river discharge (60 %) at its confluence with the Elbe River, the Eger/Ohře (38 m 3 s −1 ), the Mulde (67 3 s −1 ), the Saale (117 m 3 s −1 ), the Schwarze Elster (21 m 3 s −1 ) and the Havel (114 m 3 s −1 ).The Elbe River basin is situated in a transition zone between temperate (lower Elbe) and continental climate (middle and upper Elbe).Especially in the upper Elbe, the climate is strongly modified by the relief (IKSE, 2005).Mean annual precipitation in the river basin is 715 mm .However, there is a large variation within the basin.In the mountainous areas mean annual precipitation is above 1000 mm, whereas in the middle Elbe mean annual precipitation is around 450 mm.In the wintertime, precipitation falls as snow in the mountainous areas.Depending on snow depth and elevation, snow melts predominantly in March although it can persist until May, resulting in a snowmeltinfluenced discharge regime.Mean annual evapotranspiration in the Elbe River basin is 455 mm (IKSE, 2005).In the highlands, thin cambisols are the main soil type, whereas in the lowlands, sandy soils and glacial sediments dominate.In the valleys, loamy soils are found.The western Elbe is covered by loess (chernozems and luvisols) (Hattermann et al., 2005).Land use is dominated by cropland (50.8 %), forest (evergreen 21.8 %, mixed 5.3 %, deciduous 3.1 %) and grassland (10.2 %).Settlements account for 6.5 % of the total basin area (CORINE European Environment Agency, 2000).Dams have been built in the Elbe headwaters and dikes have been installed along the river for flood protection purpose.The Havel region is strongly influenced by past mining activities.Previously observed flooding was predominantly generated by snowmelt in combination with rainfall in winter and spring in the upper Elbe.In summer, large-scale flooding due to long-lasting rainfall as well as small-scale flooding due to convective events were observed (IKSE, 2005).

Data
The data used in this study include climatic data as well as soil and land use information for driving a hydrologic model.Discharge observations are utilized for hydrologic model calibration and validation as well as flood event identification.

Climatic and hydrologic data
Daily meteorological data (maximum, minimum and mean air temperature, precipitation amounts, relative humidity, sunshine and total cloud cover durations) were provided by the German Weather Service (DWD) and the Czech Hydrometeorological Institute (CHMI).In the Czech part of the basin, the climate station network is less dense.The station data were corrected for inconsistencies, data gaps and inhomogeneities ( Österle et al., 2006, 2012).
The soil map was generated by merging the German soil map "BUEK 1000" provided by the Federal Institute for Geosciences and Natural Resources (BGR) and the FAO-UNESCO soil map for the Czech part.The resolution and quality of the soil maps are different, which may influence the results of the rainfall-runoff model.
The land use information is taken from the CORINE 2000 land cover data set of the European Environment Agency and is considered as static during the analysis period.
Discharge data were provided by various German water authorities and the Global Runoff Data Centre (GRDC).114 gauging stations (Fig. 1, dots), including a large number of nested catchments, were used for flood identification.While the selected gauges are densely and approximately equally distributed in the German part, six gauges were available in the Czech part.Catchment size varies between 104 km 2 and 131 950 2 .Half of the gauges covered at least 94 % of the analysis time period.Hydrological years with more than 60 days of missing data were excluded from the flood identification resulting in between 57 (1951) and 114 (1981) gauges in the analysis.27 discharge gauges (Fig. 1, red dots) were used for the calibration and validation of the rainfall-runoff model.

Scatterometer data
The remotely sensed soil water index (SWI) (Wagner et al., 1999) was provided by the Vienna University of Technology, Institute of Photogrammetry and Remote Sensing (http:// www.ipf.tuwien.ac.at/radar/ers-scat/home.htm).Surface soil moisture is derived from the radar backscattering coefficient of the scatterometers onboard of the satellites ERS-1 (1991ERS-1 ( -2000) ) and ERS-2 (1995ERS-2 ( -2011)).First, the backscattering coefficients are standardized to a reference incidence angle (40 • ) by applying a change detection method.Afterwards, they are rescaled between their minimum and maximum value to represent the driest and wettest soil moisture conditions of the topmost soil layer.The soil moisture content in the first meter of the soil (SWI) is derived by applying an exponential two-layer water model to the surface soil moisture estimates (Wagner et al., 1999).The temporal resolution of the SWI is 10 days.As soil moisture retrieval is not possible under snow and frozen soil conditions, the SWI sample times are restricted from April/May to November.228 sample times overlap with the hydrological simulation.The distance between the pixel centroids is approximately 12.5 km (Fig. 1, crosses).

Methodology
Profile (layer-depth weighted average) soil moisture content is simulated with a rainfall-runoff model.For the comparability of the profile soil moisture content of different soil types, the values are standardized by the field capacity of each soil type.In the following, the standardized profile soil moisture content is termed soil moisture index (SMI).PCA is used to map daily spatial SMI onto specific spatial patterns that express large parts of the spatial variability of the soil moisture series.Cluster analysis groups days of similar soil moisture patterns.In parallel, flood start days are derived from observed discharge time series and flood prone soil moisture patterns are identified.

Flood event identification
For the investigation of the link between large-scale soil moisture patterns and flood initiation, a flood definition that takes into account the simultaneous or time shifted flooding at several gauges is required.Several flood identification methods taking the spatio-temporal coherence of flooding into account have recently been proposed by e.g.Rodda (2005), Merz and Blöschl (2003), Keef et al. (2009), Uhlemann et al. (2010) and Ghizzoni et al. (2012).We identified large-scale flood events in the Elbe River basin using an approach proposed by Uhlemann et al. (2010) as the method is non-restrictive to a certain return period and takes the simultaneous or time shifted occurrence of peak discharges at many sites into account.A flood event is identified if at least one gauge within the catchment exceeds its 10-yr flood (POT).In a subsequent step, one searches for further significant peak discharges in a time frame three days in advance and ten days after the date of the POT.All peak dates around the POT are pooled into one flood event.Two flood events are independent from each other if the last occurrence of a significant peak of the previous flood and the first significant peak of the following flood event are separated by at least four days.Otherwise they are considered as one flood event.
In this way, each flood event is characterized by a flood start date (first gauge showing significant peak around POT) and a flood end date (last gauge showing significant peak around POT).Furthermore, each flood is characterized by a measure of the overall event severity.The severity measure combines the overall flood extent and the flood magnitude (for details see Uhlemann et al., 2010).
To explore the link between flood initiation and the occurrence of soil moisture patterns, the soil moisture patterns at the start dates of the respective flood events are examined.

Model description
The continuous daily eco-hydrological model SWIM (Krysanova et al., 1998) is a conceptual, semi-distributed model based on SWAT (Arnold et al., 1993) and MATSULA (Krysanova et al., 1989).The model has three levels of spatial disaggregation: the basin (entire considered river basin), subbasins (subdivision of the basin) and hydrotops.Hydrotops are units of unique land use and soil type which are assumed to have a specific hydrological reaction.Climate input and the groundwater routine act on the subbasin scale.Daily values of relative humidity, global radiation, precipitation, mean, maximum and minimum air temperature are interpolated at the subbasin centroids.The snow routine and the soil water balance are calculated at the hydrotops and river routing at the basin scale.1945 subbasins with a median catchment size of 33 km 2 (minimum 2 km 2 , maximum 1034 km 2 ) were implemented upstream of Wittenberge (Fig. 1).
The snow module is based on the degree-day method.Snow accumulation and melt depend on threshold air temperatures and a degree-day factor.Surface runoff is calculated with a modified version of the SCS-curve number method (Arnold et al., 1993;USDA Soil Conservation Service, 1972).In the soil routine, the soil root zone is subdivided into several soil layers in accordance with the soil profile of the specific soil type.To calculate percolation, a storage routing technique is applied on water inflow divided into slugs of 4 mm.Percolation in each layer depends on the soil water content which has to exceed field capacity and on the travel time through the layer governed by the saturated hydraulic conductivity.If the subjacent layer is saturated or if the soil temperature in a layer is below 0 • C no percolation occurs.Lateral subsurface flow is a function of the remaining drainable water volume and the return flow travel time which depends on the baseflow factor and on the saturated conductivity.If the considered soil layer is saturated, water is assumed to rise to the overlying layer.
Potential evapotranspiration is estimated with the Priestley-Taylor method (Priestley and Taylor, 1972).Based on potential evapotranspiration, plant transpiration and soil evaporation are calculated separately as a function of the leaf area index according to Ritchie (1972).The actual soil evaporation is estimated on the upper 0.3 m of the soil zone.As long as the soil evaporation accumulated since the last rainfall event is below 6 mm, actual soil evaporation equals potential soil evaporation.If the soil evaporation exceeds 6 mm, actual evaporation is assumed to decay exponentially.In the case of a snow cover, the soil evaporation is retained from the snow water content.For the estimation of the actual plant transpiration, the potential water use by plants based on the root development is calculated first.Secondly, potential water use is adapted to actual water use by the soil water content and field capacity.
The groundwater module consists of a shallow and a deep aquifer.The shallow aquifer is recharged by the percolation from the bottom soil layer with an exponential delay weighting function.The return flow is the groundwater contribution to the streamflow from the shallow aquifer.The amount of seepage from the shallow aquifer to the deep aquifer and the capillary rise from the shallow aquifer back to the soil profile are estimated as linear functions of recharge and actual evapotranspiration, respectively.
Routing is calculated with the Muskingum method (Maidment, 1993, chap. 10.2.3).Surface runoff and the sum of subsurface and groundwater are routed separately.

Model calibration and validation
The Elbe River basin is subdivided into 27 regions, assuming homogeneous parameterization within each region (Fig. 1 1) are calibrated over the period 1981-1989 and validated over 1951-1980 as well as 1990-2003.Parameters were estimated for each region progressively from upstream to downstream by nesting the upstream regions for which parameters are already estimated in the next step.
A weighted Nash-Sutcliffe efficiency coefficient (Hundecha and Bárdossy, 2004;Nash and Sutcliffe, 1970) was used as an objective function: where q c and q o are simulated and observed discharge, respectively.qo is the average observed daily discharge in the calibration/validation period.w(t) gives weight to certain parts of the hydrograph.To emphasize high flows w(t) equals q o (t).The first 90 days of the simulation period are used as model initialization and excluded from the efficiency calculation.
As several parameter combinations can lead to the same model performance (Beven and Binley, 1992), a Monte Carlo uncertainty analysis is carried out to identify an ensemble of parameter sets that lead to behavioral model performances.A model performance is considered as behavioral, if an a-priori set threshold of goodness of fit is exceeded.
The Monte Carlo simulation results in i data matrices SMI (m × n), where m is the number of observations in time (i.e.18 993), n is the number of subbasins (i.e.1945) and i refers to the number of behavioral parameter combinations (Monte Carlo parameter sets).
In a next step, the simulated SMI is validated.The verification with soil moisture point measurements (e.g.gravimetric) is not feasible as these are highly variable over short distances, whereas satellite-based and hydrological-simulated soil moisture estimates are spatially integrated values (Parajka et al., 2006).For this reason, the temporal SMI progression is validated against the remotely sensed soil water index (SWI) by calculating the Pearson correlation coefficient between SMI and SWI.In accordance with the area share of overlaying SWI grid points, an area-weighted average SWI is assigned to each subbasin.For each subbasin between 122 and 197 SWI estimates are available for model validation.

Principal component analysis
The standardized soil moisture simulations of the behavioral Monte Carlo runs are arranged consecutively in time reshaping SMI into a matrix SMI * of size PCA is applied to reduce the data dimensionality of SMI * .First, the spatial linear Pearson correlation matrix R (n × n) of SMI * is calculated giving equal weight to all subbasins and Monte Carlo parameter sets.As R is square and symmetric and therefore diagonalizable, one can identify the eigenvectors u (specific spatial patterns expressing large parts of the spatial SMI variability) and the eigenvalues diag (λ) of the matrix R: by solving where I is the (n × n) identity matrix.The eigenvectors u are sorted in decreasing order of their corresponding eigenvalues λ as the eigenvalue λ k is a measure of the explained variance ev k of the corresponding principal component PC k : Hence, the leading first eigenvector points in the direction of the highest variance of SMI * and the next eigenvector explains the subsequent highest variance with the condition being orthogonal to the already identified eigenvector.The PCs are obtained by projecting the standardized (zero mean, unit variance) data matrix of SMI * onto the eigenvector u k .
As SMI * compromises all behavioral Monte Carlo runs, uncertainty bounds of each PC k are obtained by decomposing PC k of size ((i × m) × 1) into i time series of length m.
The PCs are tested for significance using the rule-N approach (Overland and Preisendorfer, 1982).The calculated normalized eigenvalues are compared against the normalized eigenvalues of a random Gaussian matrix.Those leading normalized eigenvalues that are higher than the 95th percentile of the simulated random eigenvalues (1000 Monte Carlo runs) are treated as significantly different from a random field.For further details on PCA see e.g.Hannachi et al. (2007), Joliffe (1986) or Preisendorfer (1988).

Cluster analysis
The hierarchical Ward cluster algorithm (Ward Jr., 1963) is implemented on the leading PCs to identify days of similar soil moisture patterns.At the beginning, every day D represents a single cluster.At each analysis step, the union of all possible cluster pairs is considered and the cluster pair t that offers the smallest increase in variance V is merged.
where PC tDk denotes the value of the k-th PC at day D belonging to cluster t.The algorithm merges the days consecutively until all days are united in a single cluster.Including all behavioral Monte Carlo runs, the extent of a single PC is ((i ×m)×1) i.e. ((38×18 993)×1).Due to limits in the computer capacity, it is not feasible to accomplish a cluster analysis directly on the leading PCs.Thus, we execute the cluster analysis in two steps.First, cluster analysis is carried out on the leading PCs of each behavioral Monte Carlo run separately.To merge the cluster results of the behavioral Monte Carlo runs, a second cluster analysis is applied on the cluster centroids (median of respective cluster members) of the behavioral Monte Carlo runs.It is assumed that the number of clusters in the first and second cluster step is the same.
Depending on the parameterization of a particular Monte Carlo run, a single day D may have different pattern characteristics and is thus assigned to different clusters t.Subsequently, each day D is assigned to the cluster with the highest probability of occurrence expressed as p D : In order to estimate the influence of model parameterization on the cluster t, the median p D of all days belonging to a specific cluster t is calculated, which defines the probability of cluster membership p t : Thus, a small probability of cluster membership p t indicates a strong influence of the model parameterization on the cluster assignment, while a large probability of cluster membership p t indicates a weak influence.To determine the number of leading PCs in the clustering as well as the number of clusters (Eq.7), a sensitivity analysis is conducted varying the number of PCs and the number of clusters.Based on p t (Eq.9) the optimum PC-cluster combination is selected.

Identified flood events
From the observed discharge time series, 94 flood events are identified, out of which 60 % are winter (November-April) events.Figure 2 displays the flood events separated by the month of the flood start date and the severity class s.In February, March, June and December more than ten flood events are initiated.September and October have by far the lowest number of flood initiation.In November, no flood events are initiated.High severities (s > 100) are found in the winter months December as well as March and in the summer months June to August.The severe events are not restricted to the months with the highest number of flood initiation.As the severity is a combined measure of flood magnitude and extent (affected river network), one has to take their respective influence into account.Winter events are characterized by a large spatial extent of minor magnitudes.In contrast, summer events are either characterized by a small spatial extent of very few extreme magnitudes or by a large spatial extent of miscellaneous magnitudes.

Hydrological modeling
The application of the Monte Carlo approach resulted in 38 behavioral parameter sets.In the calibration period, all gauges have a median weighted Nash-Sutcliffe efficiency between 0.55 and 0.8.The performance difference between the behavioral Monte Carlo parameter sets is negligible.Their median weighted Nash-Sutcliffe efficiency ranges between 0.71 and 0.74.In the validation period, the gauges median weighted Nash-Sutcliffe efficiency ranges between 0.53 and 0.81  as well as 0.26 and 0.87 (1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003).Gauge Havelberg has by far the lowest efficiency which can be attributed to various lakes and strong anthropogenic modifications (mining) not represented in the model.Due to the chosen model performance measure, the calibration puts more emphasis on high flows compared to low flows.As a consequence, runoff volume is in the median overestimated by 33 % in the calibration period and by 28 % in the validation period 1951-1980 (40 % in 1990-2003).
For the validation of the SMI, the Pearson correlation coefficient (significant level 5 %) between the median simulated SMI and the remotely sensed SWI was calculated (Fig. 3).Except for a few local spots, basin-wide high correlations are observed.The median correlation is 0.57, the difference between the 25th and 75th quantile is 0.2.The individual examination of the Monte Carlo parameterizations leads to the same findings.Hence, the simulated temporal progression of the SMI is well represented in the hydrological model.

Principal component analysis
The leading 43 out of 1945 PCs are significantly different from a random field explaining 97.66 % of the total variance.The leading four PCs and their corresponding eigenvectors are displayed in Fig. 4. The eigenvectors indicate geographic regions of simultaneous anomalies in the SMI.The minimum and maximum values of the PCs correspond to the parameter uncertainty introduced by the rainfall-runoff model.
The first PC explains 75.71 % of the total variance and has a seasonal behavior.The influence of the parameter uncertainty on the PC is small.The corresponding loadings show a low spatial variability across the catchment (Fig. 4, top).The subsequent PCs have a damped and lagged seasonal behavior and their loadings are spatially heterogeneous.The second PC explains 8.60 % of the total variance, seasonality is still visible.Compared to the first PC, the influence of the parameter uncertainty on the PC increased, although the general behavior is the same.Its corresponding loadings show a north-south partition.The German part of the river basin has positive loadings excluding the upstream areas of the Saale and Mulde, whereas the Czech part of the river basin has negative loadings.The third PC, explaining 1.87 % of the total variability, has no apparent periodic behavior.High positive loadings are found in the Saale region and the mountainous area of the catchment.The fourth PC shows the highest influence of parameter uncertainty of the presented PCs.The explained variance is 1.49 %.The loadings are positive in large parts of Saale region, slightly positive in the central Czech part of the catchment and highly negative in the downstream Havel region.

Cluster analysis
Up to 15 of the leading PCs were chosen as variables in the cluster analysis to identify days of similar soil moisture patterns.This implies that between 75.71 % (first PC) and 94.10 % (all leading 15 PCs) of explained variance are involved in the clustering.Applying the variance criteria of Ward (Eq.7), the weight assigned to a particular PC depends on its explained variance, as the individual PCs are not standardized, i.e. the first PC has the highest weight.
The probability of cluster membership p t (Eq.9), which evaluates the influence of model parameterization, was used to select an optimum PC-cluster combination.Figure 5 (left) displays the median p t of different PC-cluster combinations.A division in two clusters leads in the median to well distinguishable clusters independent of model parameterization (median p t ∼ 1).The median p t decreases up to a division in approximately 12 clusters and remains stable for a higher number of clusters.In general, the median p t is independent of the number of PCs in the cluster analysis.An exception is the clustering of three or less PCs.In this case, the included variability is insufficient to derive a high number of distinguishable clusters as indicated by a low median p t .In the following, the number of PCs in the cluster analysis is fixed to four since the variability (86.67 % of explained variance) is sufficient to derive a high number of distinguishable clusters.
Henceforward, clustering the leading four PCs, Fig. 5 (right) displays the distribution of p t for different cluster numbers.In almost all cases, clusters with a very high p t and thus independent of model parameterization and clusters with low p t (< 0.5) and thus dependent on model parameterization can be found.In a subsequent step, the analysis is restricted to ten clusters, since the median p t peaks for ten clusters (0.74) and decays for more clusters.
The SMI patterns of the cluster centroids, when separating the leading four PCs into ten clusters, are presented in Fig. 6 dominate in the mid-and lowlands (pattern six).The SMI rises in the upstream part of the basin (pattern five).The northern part of the basin gets wetter (pattern four) and the SMI in the upstream part of the basin increases additionally (pattern three).Here, 28.4 % of the catchment area has an SMI above 0.8.Finally, cluster nine shows high SMI over the entire basin.43.8 % of the catchment area has an SMI above 0.8 and 90.1 % of the catchment area has an SMI above 0.6.Table 2 displays the statistics of the ten clusters.Each day is represented 38 times, corresponding to the number of behavioral parameter sets.Cluster nine is the biggest cluster, containing 24.1 % of total days.Clusters two and seven are the smallest clusters, containing around 5 % of total days.The probability of cluster membership p t ranges between 1.0 and 0.58.Cluster nine has the highest p t .Clusters two and six have the lowest p t .Here, in the median 22 out of 38 parameter sets assign a respective day to the clusters.In order to indicate the day-to-day variability of the classification, the median persistence of each pattern type/cluster is calculated.Cluster nine has the highest persistence.In the median, the pattern persists for eleven days (average of 43 days).With a median duration of seven days, clusters three and seven have the second highest persistence.The other pattern types have a median duration of either four or five days.The monthly frequencies of the different pattern types are presented in Fig. 7 (top).The frequencies express the relative occurrence of a particular pattern type in each month.Seasonal differences between the pattern types are visible.For instance, cluster three and nine are winter patterns.Cluster seven occurs in summer.Cluster four predominates in April, whereas cluster one predominates in June.In winter and in particular at the beginning of the year, clusters three and nine are the dominant patterns, whereas in summer/autumn the occurring clusters are more various.

Soil moisture patterns and their relationship to flood initiation
In the following, the soil moisture pattern types are characterized according to their relationship with flood initiation.Since the flood start days are considered for all parameter sets separately, 38 times 94 flood start days are included in the analysis.Cluster nine comprises the highest percentage (52.9%) of flood start days (Table 2).Clusters eight and ten contain less than 2 % of flood start days each.Cluster seven contains nearly no flood start days.The frequency of the flood start days within each cluster expresses how often the respective pattern type can be related to flood initiation.
Cluster nine has the highest frequency of flood start days.The frequency of flood initiation is 1.08 %.The frequency of flood start days in cluster eight is one order of magnitude lower.In comparison with no clustering, flood start days are accumulated in cluster nine and the remaining clusters account for relatively few flood start days.The relative frequency of flood start days within a respective month and pattern type are presented in Fig. 7 (bottom).In summer, the highest relative frequencies of flood start days can be found and a multiplicity of pattern types are related to flood initiation.In winter, cluster three and in particular cluster nine, both characterized by high soil moisture across the entire basin, are related to flooding.Here, the relative frequency of flood start days is approximately constant from January until April/May.Although the two clusters have approximately the same seasonal distribution (Fig. 7, top), cluster nine is primarily relevant for flood initiation in winter, whereas cluster three is of primary importance in summertime flood initiation.In June, cluster three has the highest relative frequency of flood start days of all clusters.This seasonality shift between pattern frequency in general (Fig. 7, top) and pattern flood initiation (Fig. 7, bottom) also occurs in the case of clusters four and two.
Beside the pattern type frequency and relationship to flood initiation, the pattern type persistence three days in advance and after the flood start date was investigated (not shown).On the one hand, the patterns after the flood start date are more persistent than in the days ahead of the flood start date.On the other hand, there is a clear difference in the pattern persistence between summer and winter events.In wintertime, the soil moisture pattern types are persistent both in advance and after the flood start date.Occasionally, the patterns shift from cluster three into cluster nine and vice versa in the days ahead of the flood start date.In summer, either pattern type three is persistent in advance of the flood start date or a continuous wetting (transformation from pattern six, five or four) occurred.The summer patterns of low SMI are either persistent or transform into wetter patterns after the flood start date.
Finally, the relationship between the soil moisture patterns types and flood severity is analyzed (Table 2).For high SMI (patterns three, four and nine) the median flood severity increases with the degree of saturation.For dry soil moisture conditions this is not the case.Cluster eight has a median flood severity of 21.7, whereas the wetter pattern six has a median flood severity of 7.4.

Soil moisture pattern classification
Large-scale soil moisture dynamics were simulated with a hydrologic model.As the model was calibrated on discharge, an integrated measure of catchment processes, the spatially distributed processes within the gauged catchment may not be well represented and small-scale patterns may be a calibration artifact.In the present study, especially the temporal SMI progression was of relevance since the PCA was based on the spatial correlation matrix (Eq.3).The validation of the simulated SMI with the remotely sensed SWI proved the basin-wide reliability of the temporal SMI progression.As there is no obvious difference in the correlations between the Czech and the German part of the basin, an impact of the different soil maps and climate station densities on the temporal progression of simulated soil moisture can be excluded.Applying a different hydrologic model in Austria, Parajka et al. (2006) obtained considerably lower correlation coefficients of root zone soil moisture (median of 0.07 (1991-1995) and 0.12 (1996-2000)).As soil and land use properties are included in the model, there is a physical meaning behind the soil moisture patterns.For this reason, we did not interpolate the SMI on a regular grid as done by other studies (e.g.Perry and Niemann, 2007;Kim and Barros, 2002).Rather, we conducted the PCA at the subbasin scale with the drainage divides as natural boundaries.
The large amount of total variability explained by the first PC indicates that there is one dominant mode controlling the temporal soil moisture variability.This temporal mode has a seasonal behavior with maximum values in winter and spring and minimum values in summer (Fig. 4, top right).These seasonal soil moisture changes are mainly attributable to seasonal changes in evapotranspiration, leading to soil moisture depletion in summer and rise in winter and spring (Parajka et al., 2010).Additionally, snowmelt may have an impact.All subbasins respond in the same direction in terms of seasonality, indicating considerable similarity in the processes controlling soil moisture variability (Fig. 4, top left).Heterogeneous hydrological processes and catchment properties are overruled by seasonality.Perry and Niemann (2007) reported similar findings for the leading PC.Limiting the analysis to a smaller area receiving evenly distributed precipitation would possibly identify PCs attributable to single precipitation events.As the loadings are not only positive across the catchment but also approximately of the same magnitude, the first PC is a measure of the catchment average SMI.In contrast, the subsequent PCs are a measure of the disparity, describing local variations departing from the regional value.Their damped and lagged temporal progression might be attributable to the antecedent soil moisture conditions, i.e. the previous spatial distribution of precipitation.
The PCA showed that the retrieved soil moisture patterns have a stronger signal than the parameter uncertainty introduced by the rainfall-runoff model (Fig. 4, right).The first PC has the smallest relative difference between the members of the parameter ensemble and is least influenced by model parameterization.The relative differences between the ensemble members are higher for the subsequent PCs revealing a greater impact of model parameterization on spatially heterogeneous hydrological processes, as e.g.infiltration and soil storage, than on the seasonal progression of soil moisture.
The cluster analysis identified days of similar soil moisture patterns.Each soil moisture pattern type can be attributed to a pattern frequency, a pattern persistence and seasonal characteristics (Table 2, Fig. 7) in the same manner as in weather type classifications (e.g.Philipp et al., 2010).The probability of cluster membership p t yielded an objective choice of the number of PCs as well as clusters with respect to the parameter uncertainty of the rainfall-runoff model.Pattern nine characterized by high soil moisture content over the entire catchment is least dependent on model parameterization (highest p t ) and most persistent.Independent of model parameterization, any additional rainfall will be transformed into runoff without leading to alteration of the pattern type.Dry SMI patterns have a high p t too.Likewise, the absence of rainfall does not lead to an alteration of the pattern type.Intermediate SMI patterns show the lowest p t and are comparatively less persistent, as these patterns are a transitional stage to either wetter or dryer soil moisture conditions.

Soil moisture patterns and their relationship to flood initiation
In flood frequency analysis and design value estimation, flood events are defined for one particular gauge.An extreme value distribution is fitted to annual maxima series or a peakover-threshold series of observed discharge.In the present study, the flood identification differed from this commonly applied approach.Instead, a flood identification method that takes into account the large-scale response has been adopted (Uhlemann et al., 2010).This is a more suitable approach in terms of the linkage between floods and large-scale soil moisture pattern types.Among the 94 detected flood events, the most severe events are well documented in terms of hydro-meteorological conditions, e.g. the flood events of August 2002 (Engel, 2004;Ulbrich et al., 2003a, b), July 1954(Hauptamt für Hydrologie, 1954;Boer et al., 1959), or December 1974/January 1975 (Schirpke et al., 1978).Nevertheless, the flood event set may be incomplete due to ungauged small catchments receiving convective rainfall, as well as the requirement of an observed 10-yr flood.Furthermore, the flood event set may be biased towards flooding in the German part of the Elbe catchment, as only six gauges were available in the Czech Republic.Previously, Uhlemann et al. ( 2010) highlighted the occurrence of summer flooding in the Elbe Basin.However, when analyzing trans-basin German floods, they found that the severe events were limited to the winter period.Our study, which is limited to flood identification in the Elbe, results in events of high severity (s > 100) both in winter and summer.Analyzing annual maximum flood series in Austria, Merz and Blöschl (2009) did not detect a dependence between the flood moments and seasonality.Instead, mean annual flood flows were positively correlated with average antecedent rainfall as a surrogate for the catchment soil moisture state (Merz and Blöschl, 2009).Comparing the soil moisture pattern types and their respective median flood severity (Table 2), the results are heterogeneous and reflect different flood-generating processes.On the one hand, the median flood severity increased for patterns of high SMI.In this case, the catchment storage capacity is exceeded, e.g.due to long-term low-intensity rainfall, and any additional rainfall results in a runoff increase.In addition, if most parts of the catchment are saturated, a larger number of gauges may be flood affected in case of synoptic rainfall, as shown on the pan-European scale by Prudhomme and Genevier (2011).In their regional flood typology, Merz and Blöschl (2003) termed this type long-rain flood.On the other hand, the relatively dry soil moisture pattern eight shows high median severity too.This is attributable to high-intensity rainfall on relatively dry catchment conditions, which may lead to flash floods (Merz and Blöschl, 2003).The flood-generating processes are revealed in the seasonality of the soil moisture pattern types.In winter, flooding is related to high soil moisture content in the entire basin (cluster nine).The respective soil moisture pattern is persistent before and after the flood start date, indicating flooding either due to long-term lowintensity rainfall (long-rain flood) or saturated soils due to snowmelt (snowmelt flood).In summer, the flood-initiating soil moisture patterns are more variable and less persistent.On the one hand, large-scale flooding is observed.Long lasting rainfall leads to high soil moisture content over large parts of the basin (cluster three).On the other hand, convective events (flash floods) characterized by relatively dry catchment conditions (cluster eight) occur (IKSE, 2005).
In addition to soil moisture, other patterns, in particular precipitation, are relevant for flood initiation (Merz andBlöschl, 2008, 2009;Brocca et al., 2008;Parajka et al., 2010;Marchi et al., 2010).This is indicated by the seasonal cluster distribution and the deviating seasonality of the flood start days inside the cluster (Fig. 7, e.g.cluster three).Furthermore, extending the pattern classification approach to e.g.precipitation patterns, the role of saturated soils limited to parts of the catchment when receiving rainfall may be pronounced.

Conclusions
Flood generation and magnitude are the result of a complex interaction between the meteorological situation and pre-event hydrological catchment conditions.The impact of catchment conditions on floods and flood severity is expected to depend on various factors, such as season or flood type (e.g.snowmelt flood, long-rain flood, flash flood) and is especially difficult to decipher in large river catchments with catchment-internal variation in flood generation.To date, no studies are available that attempt to understand the spacetime behavior of large-scale hydrological catchment conditions and link them to flood initiation.As a step in this direction, we propose classifying the hydrological catchment conditions and link flood occurrence to large-scale catchment state patterns.This approach is complementary to the widespread classification of circulation patterns in meteorology.As soil moisture is a key variable of hydrological catchment conditions, model-simulated soil moisture was used to answer two questions: what is the dominant space-time behavior of soil moisture at the regional catchment scale?Are there soil moisture patterns that are related to large-scale flood initiation?By applying PCA, the dimensionality of the soil moisture data was reduced to four PCs representing 86.67 % of the total soil moisture variability.The seasonally wetting and drying of the catchment represented by the first PC is the dominant mode, whereas the successive PCs describe spatially heterogeneous catchment processes.Cluster analysis assessed the similarity in daily soil moisture distribution within the catchment, and assigned each day of the investigation period to one of ten soil moisture pattern types.In parallel, 94 start days of large-scale flood events were identified and enabled a probabilistic linkage between 1. Different soil moisture patterns are not equally associated with flood occurrence.
2. Patterns with catchment-wide high soil moisture accumulate the majority of flood start days.
4. Seasonality of soil moisture pattern frequency and seasonality of soil moisture pattern flood initiation are not identical.
5. Occurrence of a certain soil moisture pattern does not necessarily lead to flood initiation, but the probability of occurrence of a large-scale flood may be increased.
While these results underline the importance of catchment state for flood initiation and severity, (4) and ( 5) indicate that, beside soil moisture, other patterns are relevant for flood initiation.Therefore, future work will extend the pattern classification approach not only to circulation patterns but also to snow.A combination of hydro-meteorological pattern types would enable to quantify the interaction of patterns of hydrological catchment conditions and meteorological conditions on flood initiation and magnitude.

Fig. 1 .
Fig. 1.Topographic map of the Elbe River basin.Yellow dots: discharge gauges applied in flood identification.Red dots: discharge gauges applied in flood identification as well as hydrologic model calibration and validation.Crosses: location of pixel centroids of the scatterometer data.

Fig. 2 .
Fig. 2. Number of flood start dates per month and severity class s.

Fig. 4 .
Fig. 4. Eigenvectors (left) and their corresponding PCs (right) of the leading four PCs.PCs are displayed for the sub-period 1982-1991.Minimum and maximum values correspond to the parameter uncertainty introduced by the rainfall-runoff model.

Fig. 5 .
Fig. 5. Median probability of cluster membership p t of different PC-cluster combinations (left).PC-cluster combinations with a small median p t are strongly influenced by model parameterization.Distribution of p t for different numbers of clusters when clustering the leading four PCs (right).

Fig. 7 .
Fig. 7. Frequency of occurrence per month of different soil moisture pattern types [%] (top).Relative frequency of flood start days per month and respective soil moisture pattern type [‰] (bottom).
large-scale soil moisture patterns.The following conclusions can be drawn:

Table 1 .
Model parameters and their calibration range.