Interactive comment on “ Large-scale regionalization of water table depth in peatlands optimized for greenhouse gas emission upscaling ”

Fluxes of the three main greenhouse gases (GHG) CO2, CH4 and N2O from peat and other soils with high or- ganic carbon contents are strongly controlled by water ta- ble depth. Information about the spatial distribution of water level is thus a crucial input parameter when upscaling GHG emissions to large scales. Here, we investigate the potential of statistical modeling for the regionalization of water lev- els in organic soils when data covers only a small fraction of the peatlands of the final map. Our study area is Germany. Phreatic water level data from 53 peatlands in Germany were compiled in a new data set comprising 1094 dip wells and 7155 years of data. For each dip well, numerous possible predictor variables were determined using nationally avail- able data sources, which included information about land cover, ditch network, protected areas, topography, peatland characteristics and climatic boundary conditions. We applied boosted regression trees to identify dependencies between predictor variables and dip-well-specific long-term annual mean water level (WL) as well as a transformed form (WLt). The latter was obtained by assuming a hypothetical GHG transfer function and is linearly related to GHG emissions. Our results demonstrate that model calibration on WLt is su- perior. It increases the explained variance of the water level in the sensitive range for GHG emissions and avoids model bias in subsequent GHG upscaling. The final model explained 45 % of WLt variance and was built on nine predictor vari- ables that are based on information about land cover, peat- land characteristics, drainage network, topography and cli- matic boundary conditions. Their individual effects on WLt and the observed parameter interactions provide insight into natural and anthropogenic boundary conditions that control water levels in organic soils. Our study also demonstrates that a large fraction of the observed WLt variance cannot be explained by nationally available predictor variables and that predictors with stronger WLt indication, relying, for exam- ple, on detailed water management maps and remote sensing products, are needed to substantially improve model predic- tive performance.


Introduction
Greenhouse gas (GHG) emissions from organic soils can be high compared to mineral soils.In Germany, the fraction of organic soils classified as peatland covers only 5 % of the land surface, but does account for 40 % of GHG emissions in the reporting categories "agriculture" and "land use, land use change and forestry" of the UN Framework Convention on Climate Change (UNFCCC) (UBA, 2012).Also, other organic soils with a lower soil organic carbon content (SOC) but still meeting the definition of organic soils according to IPCC (2006) are important sources of persistently high GHG emissions (Leiber-Sauheitl et al., 2014).In our study, we also consider these soils.For simplification, we will refer in the following to the total of peatlands and "other organic soils" as organic soils.Current estimates of GHG emissions from organic soils are fairly uncertain and reporting of most countries relies on IPCC default emission factors (EF) for CO 2 emissions which are stratified for land use and climatic region, e.g., 10 t C ha −1 year −1 for arable land in the warm temperate zone.
Artificial drainage turns the function of former natural peatlands from a C sink into a C source.Experimental work with organic soils during the last 2 decades showed that the aerated soil pore space above the water level is one of the key variables explaining the amount of CO 2 emissions (Moore and Dalva, 1993).Frequently, the water level relative to soil surface (further simply referred to as "water level", with negative values below ground) is used as proxy for air-filled porosity, given the simplicity and availability of water level measurements.Additionally, low water levels and oxygen availability are also key drivers of nitrous oxide (N 2 O) production in organic soils (Regina et al., 1996), which increases the relevance of organic soils for climate change mitigation policy.During anaerobic conditions when water levels are at or above the land surface, substantial methane (CH 4 ) emissions can occur (Levy et al., 2012).
It is postulated that the GHG budget -the sum of the CO 2 -equivalents of the three main greenhouse gases (CO 2 , N 2 O, CH 4 ) -is at a minimum for annual mean water levels (annual mean further defined by the variable name WL) at about −0.05 to −0.1 m (Drösler et al., 2011).Following atmospheric sign convention, a positive sign stands for net emissions, while a negative sign indicates a net uptake of GHGs.Other parameters, such as physical and chemical soil properties and vegetation, also influence the amount of the emissions and thus weaken the relation between total GHG budget and WL.
If available, information about the spatial distribution of WL can identify GHG hot spot regions and improve the accuracy of the total GHG budgets at large scales.The application of transfer functions that relate GHG emissions to WL and potential other influencing site characteristics can refine the estimates derived from simple application of IPCC default EFs.However, in many countries and regions, as for example Germany and Europe, a map of WL in organic soils does not exist.The spatial availability of measured WL is much higher than that of measured GHG fluxes, which suggests the use of WL as scaling parameter for upscaling GHG emissions.
Several methods were applied in the past to produce WL maps.Their suitability is strongly related to data availability, which very often decreases in quality and spatial density with increasing scale of the study area.Spatially distributed process-based modeling (Thompson et al., 2009) and semiphysical statistical approaches (Bierkens and Stroet, 2007) are able to reproduce well the water level dynamics in wetlands environments including peatlands.However, they heavily rely on spatial information about the system's physical properties and boundary conditions (peat hydraulic properties, hydraulic conductivity of peat base, drainage system), data that is often only available with sufficient detail at a regional scale (Limpens et al., 2008).Despite this difficulty there are studies in which process-based models were applied to model peatland water levels at a large scale (national or continental).Gong et al. (2012) adopted a common soil-vegetation-atmosphere transfer model to account for the differing hydrological processes in pristine fens, pristine bogs and drained peatlands, and modeled water level fluctuations in boreal peatlands for all Finland.But calibration and validation with data from only three mires does not allow for conclusions about the accuracy and general applicability of the model.Numerous large-scale hydrological wetland models are often developed with a focus on delineating wetland extent (Melton et al., 2013).TOPMODEL-based schemes (Ju et al., 2006) and more advanced large-scale hydrologic frameworks (Fan and Miguez-Macho, 2011) are suited to model WL but do not account for anthropogenic drainage and thus are only applicable to pristine (or nearly pristine) peatland systems.
When detailed physical model input that is needed for a physically based approach is lacking, statistical or machinelearning tools represent a promising alternative (Finke et al., 2004).Potential predictor variables that are available at the final map scale are determined for each location with water level data and the algorithm identifies dependencies between potential predictors and target variables, such as WL or other statistical values that describe water level dynamics.For areas rich in water level data, e.g., the Netherlands, residuals of the statistical model can afterwards be analyzed for spatial correlation.If this is present, it can be used to correct for spatially correlated model bias by kriging.This scheme has been applied to agricultural areas by Finke et al. (2004) and to nature conservation areas by Hoogland et al. (2010).Spatial interpolation approaches can include ancillary data such as mapped geophysical parameters (Buchanan and Triantafilis, 2009).Statistical approaches strongly rely on both the quantity and quality of the data on the target variable itself, i.e., the water level data.An important quality criterion for water level data from organic soils is the measurement depth.It is crucial that there is little or no hydraulic resistance by a low conductive layer between the perforated part of the monitoring well and the fluctuating water level.If hydraulic resistance is too high, the monitoring well acts as a piezometer and water levels may substantially differ from the actual phreatic level as shown for peatlands by van der Gaast et al. (2009).If such piezometer data is part of a data set and interpreted as phreatic water level data during model calibration, this can lead to an under-or overestimation of predicted water levels in organic soils.An underestimation of water level predictions (too dry) is discussed for Dutch modeling studies in van der Gaast et al. (2009).
At present in Germany, a map of water levels in organic soils that could be used for GHG upscaling is lacking.This fact and current efforts on improving GHG emission estimates for German organic soils were the main drivers for our study.Thus, the major goal of this study was the development of a model concept that produces a water level map at the scale of all organic soils in Germany that is specifically optimized for water level ranges to which GHG emissions react sensitively.We emphasize that the objective of our study was to regionalize annual mean water levels and not the GHG emissions themselves.The latter are influenced by more site characteristics, in particular soil properties.Furthermore, we suppose that annual mean water level is probably not the only or optimal statistical measure to describe the water level effect on annual GHG emissions.However, we are not aware of well-established information about transfer functions that relate more complex statistical measures of water level dynamics to GHG emissions.Therefore, we here focused on the simple and frequently applied "annual mean water level".
As a first step, we compiled a new data set of phreatic water level time series of organic soils with contributions from numerous data sources.Based on this data, we developed a modeling approach for the annual mean water level that follows the basic idea of the statistical regionalization presented in Finke et al. (2004).However, the data coverage in our study substantially differed from their study.Our data covers only a small fraction of the peatlands of the final map and spatial interpolation of residuals was not possible.We thus extended their approach by: -including additional possible predictor variables, -using boosted regression trees as a modeling tool to identify the influence of both numerical and categorical variables simultaneously, -applying a new weighting scheme that balances out heterogeneous water level data sets with highly variable spatial data density, -transforming the annual mean water level, WL, into a transformed annual mean water level, WL t , that shows a linear relationship with the GHG budget and optimizes model calibration for the WL range relevant for GHG emissions and -restricting the water level regionalization to phreatic water levels of organic soils.
We present a detailed analysis of the influence of the individual predictor variables on water levels of organic soils as well as their interactions.Furthermore, the manuscript includes the estimation of model uncertainty and possible paths of future model improvement.Finally, the calibrated model is used to derive a map of WL t for all organic soils in Germany, and the regionalization results are presented.
2 Data set and methods

Data set of phreatic water levels in organic soils
Available data of phreatic water levels in organic soils are scarce.In contrast to data of rather deeply drilled observation wells of official groundwater monitoring networks, short peatland observation wells of only 1 or 2 meters length that measure the phreatic water level of the peat layer are currently not collected in central data management systems in Germany or any of its federal states.With a comprehensive questionnaire started in 2011, we collected water level time series of organic soils from local agencies, non-governmental organizations, universities, consultants and other sources, and combined this data with water level data from our projects.Time series included manual and automatic measurements.Years with less than six measurements or data gaps of more than 3 months were excluded.Water level time series of each dip well were visually checked on plausible dynamics by comparing with data from neighboring dip wells and weather data time series.Based on auxiliary data and local knowledge, we further identified dip wells that reached down to the underlying aquifer.If dip wells failed these quality checks, they were removed from the data set.
The final data set comprised 7155 years of data from 53 German peatlands and 1094 dip wells.On average time series ranged over 7 years.All time series were collected at some period between the years 1988 to 2012.Data are well distributed over most of the German peatland regions and cover the three major types of organic soils (Fig. 1).Compared with the distribution of the types of organic soils in Germany, the fraction of dip wells on bogs is overrepresented in the data set by the factor of 2.5, while dip wells on fens and other organic soils are slightly underrepresented.Data also cover the common land-use types (for data sources see Table 1).However, dip wells on organic soils that are neither used for agriculture, forestry or peat mining, further referred to as "unused peatlands", are overrepresented in the data set by a factor of 6 as data was collected more frequently and in higher spatial data density in the frame of conservation projects.The fraction of unused peatlands of the German organic soils is 6 % and the fraction in the data set is 36 %.In contrast, dip wells on arable land are underrepresented in the data set by a factor of 6.The fraction of arable land on German organic soils is 24 %, and the fraction in the data set is 4 %.The other two key land-use types of organic soils in Germany, grassland and forest, are well represented in the data set.The misbalance of the land-use types in the data set is accounted for in the weighting of data (see Sect. 2.3.2).
If land use changed within the measurement period of a dip well, the time series was split at the moment when the land-use record indicates the transition.For each segment the annual mean water level, WL (here with negative values defined as water levels below ground), was calculated as the multi-year average value over the whole measurement period of the specific land use.
The primary application of the WL map produced in this study is for the upscaling of long-term GHG emissions as emission reporting may only reflect anthropogenic effects, but not interannual climatic effects.As GHG transfer functions are developed on annual data, their application requires both the long-term annual mean water level, as well as its interannual variability.Due to the non-linear dependence of GHG emissions on WL, single years with extreme water levels can strongly influence long-term average GHG fluxes.This study is focused on the regionalization of the long-term annual mean water levels.For this objective, model building should be based on long-term water level time series to average out the effect of weather variation within a complete climatic period (commonly 30 years).The existing nationally available data on water level time series of organic soils, however, does not comprise a single time series with complete data coverage over the last 30 years.Due to the lack of sufficient long-term water level time series, we included all time series in the model building process.Average climatic boundary conditions (precipitation, reference evapotranspiration, water balance) of the specific measurement period of each dip well are part of the predictor variables (see Sect. 2.2), and thus are supposed to partly account for the effect of specific weather conditions on WL in case of short measurement periods.

Predictor variables
Spatial coverage of phreatic water level data of organic soils is too low to obtain WL maps by simple spatial interpolation (Fig. 1).Additional spatial data is needed as basis for regionalization.Ancillary information that covers fully or at least most of the extent of the final map are necessary.They can be used as predictor variables.A comprehensive set of variables (numerical and categorical) with potential indication for the hydrological condition of an organic soil were determined for each dip well (Fig. 2 and Table 1).
The predictor variables, which can partly be found also in Finke et al. (2004), can be divided into seven groups:

Land cover
As certain land use and vegetation require and reflect certain WL, such information can be used as an indicator for the average drainage level around the dip well.Land-use and vegetation information is based on the German Digital Landscape Model (ATKIS Basis-DLM), which is updated continuously by aerial photos as well as sporadic ground mapping and has a temporal accuracy of 3 months to 5 years.It is provided as fine-scaled polygons and represents the best uniform land cover information available in Germany.It contains information on primary land-use type, few optional vegetation attributes and whether "wet soil" has been observed during mapping.As we noticed that the use of a large number of categorical variables lowers the performance of boosted regression trees, we further aggregated the three information types (i) land use, (ii) vegetation and (iii) wet soil into a set of nine combined land cover classes (Table 1).These land cover classes were a trade-off between fine differentiation and the number of replicates in each class.For grasslands, a "wet grassland" class was separated when grassland was overlaid with wet soil and/or tree or shrubs vegetation, which may indicate a less intensive management.Forests overlaid with wet soil were separated as "wet forest".Further, unused peatlands overlaid with wet soil and showing no coverage with tree attributes were characterized by higher water levels and were thus separated as "wet unused peatland".The very few dip wells classified as open water (n = 2) and peat cutting (n = 5) were merged to the reed and arable land cover class, respectively.Land-use type and land cover class were extracted at the dip well (point extraction) and as fractions in various buffers around the dip well (Table 1).As using too many weak predictor variables lowers model performance and increases overfitting, the numerous land cover fractions were further aggregated into two classes: the fraction of dry (arable and grassland) and wet (reed, wet grassland, wet forest and wet unused peatland) land cover on organic soils.For the calculation of the fraction of dry land cover, we tested various factors for the reduction of the contribution of grassland compared to arable land as the grassland class also includes wetter grasslands that could not be detected with the available land cover catalogue.A factor of 0.5 was an optimal value, which was then set fixed.

Drainage network
Locations of ditches that are included as lines in the Digital Landscape Model were used to obtain information about the drainage network.The total ditch length was calculated for various buffer sizes.Further, the distance to the next ditch was calculated for each dip well.A short distance to the next ditch may indicate either lower or higher water levels, depending on whether the ditches are used for drainage or already blocked and used for rewetting measures.Similarly, the indication of the total length of ditches is not unique.Therefore, we defined two different sets of ditch variables.A first set, for which we calculated values for all land cover classes and a second one, for which we only calculated values for land cover classes for which ditches are undoubtedly used for drainage, i.e., arable and grassland.

Peatland characteristics
The geological map of Germany (scale 1 : 200 000) defined the area for which WL predictions were modeled.It is also the basis for topological peatland predictor variables, i.e., the fraction of organic soils in different buffer sizes as well as the dip well distance to the edge of the peatland.Information about the peatland type and the substrate at the peat base is presented in more detail in a newly compiled raster map of organic soils (Roßkopf et al., 2014) and was thus extracted from this map.Peatland types were aggregated into five classes: lowland bog (North German Plains and Alpine Forelands), upland bog (Central Uplands and Alps), fen neighboring surface water, fen without neighboring surface water, and a class of "other organic soils" that do not fulfill the C content and thickness criteria to be classified as peatland.Substrates at the peat base included loose unconsolidated rock (alluvial sand and gravel deposits), consolidated rock (bedrock) and peat clay layer.The first type may indicate the occurrence of seepage (positive or negative), whereas the latter two types may indicate rather a hydraulic decoupling from the aquifer hydraulic head.

Climatic boundary conditions
Climatic boundary conditions directly influence water level.On the one hand, the typical long-term climatic boundary conditions may indicate the general vulnerability of peatlands in a specific region.On the other hand, given the different lengths of measurement periods of the time series in this study, climatic boundary condition predictor variables may account for the effect of a climatically wetter or drier measurement period, compared to the long-term averages, on the water level.Climatic boundary conditions were extracted from a 1 × 1 km raster from the German Weather Service.Annual, summer and winter precipitation, FAO56 Penman-Monteith reference evapotranspiration and climatic water balance (difference between precipitation and reference evapotranspiration) were determined for the individual measurement period of each dip well and as long-term averages (30 years).

Relative altitude
Relative altitude was calculated by subtracting the median altitude of various buffer sizes from the absolute altitude at each dip well in the digital elevation model (DEM).Relative altitude is expected to have two different indications depending on the applied buffer size: (i) in many peatlands, the former smooth peatland relief at the scale of approximately > 5 m has been disturbed due to peat cutting and differences in drainage and mineralization rate.As a consequence, the rather smooth phreatic surface often does not follow the uneven and patchy terrain.Relative altitude with respect to smaller buffer sizes (< 250 m) may therefore explain part of the WL variation, e.g., a dip well that is located at a surface much higher than the surroundings may indicate deeper water levels; (ii) for large buffer sizes (> 250 m) relative altitude indicates whether the peatland lies in a larger morphological depression or elevation, and thus may indicate whether largescale lateral inflow of water can be expected or not.Similar indication is provided by the topographic index (see below).The accuracy of relative altitude values depends on the resolution and accuracy of the DEM.The nation-wide available DEM is based on data sets of varying quality, which may lower the influence of this variable.

Topographic wetness index
The topographic wetness index is a common wetness indicator used in hydrology (Beven and Kirby, 1979).It is a combined measure of catchment area and slope at a given point and indicates the extent of flow accumulation.High values indicate wetter conditions.If calculated at larger scales, higher values may hint at the occurrence of positive seepage, i.e., upward flow of water from the aquifer.Topographic wetness index was calculated for various DEM resolutions using the GRASS 7 module r.watershed.

Protection status
The protection status of a peatland area may reflect hydrological conditions.Therefore we checked for seven protection status at each dip well (see Table 1 for details).

Model building scheme
Model building was performed using boosted regression trees (BRT), implemented in the two R packages "gbm" (Ridgeway, 2013) and "dismo" (Hijmans, 2013).BRT is a machine-learning algorithm, in which the final model is derived from the data.Functions that relate target to predictor variables are not predetermined but freely developed.BRT is based on the decision (or regression) tree concept.In the decision tree concept, the parameter space is searched sequentially for the best split that results in the lowest model mean squared error.The mean responses of the groups that result from the various splits, and correspond to certain parameter ranges, represent the model.The common procedure is the growth of a large tree which is subsequently simplified by dropping weak links that are identified with crossvalidation.Growing only a single tree has several disadvantages such as uneven functions that are very sensitive to the specific sample of the data.Therefore, ensemble techniques have been combined with the decision tree concept.These were, first, the development of multiple models by bootstrapping of the samples (bagging technique) and the random creation of subsets of predictors at each split (random forest technique).Later, with the "boosting" technique of BRT, a sequential procedure was developed in which data is reweighted after each tree to increase emphasis on data that is poorly modeled by the existing collection of trees (Elith et al., 2008).
BRT modeling is increasingly applied in spatial modeling of species or numerical environmental variables (Elith et al., 2008, Martin et al., 2011), thereby often showing superior performance compared to other machine-learning algorithms.The increasing application of BRT is related to several of its favorable characteristics: the strength of this method lies in the ability to fit complex functional dependencies including non-linear relationships and interactions between predictor variables.Based on its flexibility, BRT is invariant to monotonic transformations of predictors.Furthermore, BRT allows for missing values in the predictor variables, thus predictor variable information does not necessarily need to fully cover the total map extent.The gbm package handles missing values in predictor variables by introducing surrogate splits.The mean target value belonging to the missing predictor values is attributed to these surrogate splits during model building.We observed that the contribution of a predictor variable to the final model decreases with an increasing number of missing values.This is intuitive as target observations of missing predictor values are mostly supposed to scatter strongly.BRT is further fairly insensitive to outliers and allows estimating the relative contribution of each predictor variable to the model.Due to these characteristics we expected BRT to be very well suited to the very heterogeneous data set of this study.
BRT model calibration is prone to overfitting, and there are various options to reduce this behavior.Due to the overfitting behavior, cross validation is generally part of the model building process.However, cross validation can be performed in several ways and, if performed carelessly, can lead to overly optimistic model performance (De'ath, 2007).Here, cross validation was performed by leaving out whole peatland areas instead of a random set of dip wells.This represents a stricter cross validation, and we noticed that it strongly reduced overfitting of the water level data, and thus contributed to the development of a more robust model.Another option to avoid overfitting is to impose monotonic slopes on the effects of individual parameters, which can even lead to improved prediction performance (De'ath, 2007).For all our numerical variables, we expected monotonic slopes rather than optimum functions.To avoid predefining any expected direction, all numerical variables were added twice to the set of predictors, constraining the slope to a monotonic increase and decrease.We let the model decide whether monotonic increase or decrease has higher predictive power.
Models were calibrated using a Gaussian response type, aimed at minimizing deviance (squared error) (Ridgeway, 2013).In all calibration runs, we applied the gbm.step function of the dismo package, which assesses the optimal number of boosting trees using cross validation.We tested various learning rates (0.001-0.01), bag fractions (0.1-0.8) and levels of tree complexity (3 to 7), i.e., the number of nodes in a tree.By trial and error we determined the most effective algorithm parameters for our data set being 0.005 for the learning rate, 0.6 for the bag fraction and 5 for the tree complexity.
The final BRT model building is commonly performed as a two-step procedure (Elith et al., 2008) which we basically also followed in our study: i.In the first step, the whole set of predictor variables is used to calibrate a BRT model.
ii.In a second step, the number of parameters is reduced sequentially to avoid overfitting and to derive a more parsimonious model.We tracked predictive performance criteria during the simplification process.As various variables were calculated for different buffer sizes, our predictors included a large number of correlated variables.et al., 2013).Thus, we performed this simplification process by first dropping those parameters with a correlation > 0.7 (either Pearson or Spearman type) to another parameter with a higher contribution (Clapcott et al., 2011).This ensured that two highly correlated parameters would not remain in the parameter set longer than the last parameter of another group of variables, which may contribute less compared to the two highly correlated parameters but provides extra information that is not covered by the other parameters.After all highly correlated parameters have been dropped, further parameters with low contribution were dropped progressively.
Predictor contributions are calculated as proportional contributions to the total error reduction and can be considered as a measure for the influence of the individual predictors.Additionally, a BRT model allows the derivation of partial dependence plots which indicate how the response is affected by a certain predictor after accounting for the average effects of all other predictors in the model (Elith et al., 2008).These plots do not show the full effect of each parameter on the model response due to interactions with other parameters that are fixed to derive theses plots as well as due to parameter cocorrelation.However, they can be used for interpreting model behavior (Elith et al., 2008).

WL t : transformation of WL
The map of water levels of this study was developed to improve the upscaling of greenhouse gas emissions from organic soils.Therefore, the final map should provide the highest accuracy for the water level range for which the highest differences of greenhouse gas emissions occur.This can be achieved by transforming WL into a transformed variable WL t , which shows a linear relationship with GHG emissions.The sensitivity of greenhouse gas emissions to water level has been analyzed in several laboratory and field experimental and monitoring studies (Berglund and Berglund, 2011;Drösler et al., 2011;Hahn-Schöfl et al., 2011;Leiber-Sauheitl et al., 2014;Moore and Roulet, 1993;Moore and Dalva, 1993;van den Akker et al., 2012).General trends are a strong increase of methane (CH 4 ) emissions for annual mean water levels of approximately > −0.1 m and an increase of CO 2 emissions for water levels < −0.1 m with a trend similar to a saturation function that levels out approximately between −0.4 and −0.8 m (Fig. 3a).While studies agree over these general trends, the exact shape of the transfer function and the maximum levels of emissions as well as their dependence on soil properties and other environmental parameters are still controversial.Here, we assume a hypothetical transfer function, relating the normalized GHG budget, ranging from 0 to 1, to the water level (see also Fig. 3),
As the GHG budget can be positive for both low and high WL, we introduced the transformed water level, WL t , as (Fig. 3), By calibrating the model to both WL and WL t , we test if the optimization of WL t provides the highest model accuracy for the water level range relevant for GHG emissions and if it optimizes the map for application to GHG upscaling.

Weighting scheme
When considering possible data weighting schemes, it is worth emphasizing, at this point, that the goal of this study is the development of a statistical model that can explain both the water level variability within a peatland as well as among different peatlands.The data of target and predictor variables for building this model is highly heterogeneous.First, the target variable data set contains peatland areas that strongly differ in their spatial extent and in the number of installed dip Hydrol.Earth Syst.Sci., 18, 3319-3339, 2014 www.hydrol-earth-syst-sci.net/18/3319/2014/ wells.Second, the predictor variable data set contains categorical and numerical data, and part of the predictor variables predominantly vary from peatland to peatland (e.g., climatic boundary conditions, large-scale topographic wetness index, peatland characteristics) whereas others also show withinpeatland variability (e.g., land use, small-scale topographic wetness index, drainage network).As the influence of the individual predictor variables on our target WL t is expected to be rather diffuse due to abundant interactions with other site characteristics, the robustness of derived dependencies will strongly depend on the number of different peatlands in the data set.
There are no universal data weighting rules for similarly heterogeneous data situations and some degree of expert judgment and subjectivity is inevitable involved when developing an appropriate scheme (Francis, 2011).The need for introducing a data weighting scheme is obvious, as without data weighting during calibration, too much influence would be given to small and well-studied peatlands, which will reduce predictive model performance for large, lesswell-studied peatland areas.To avoid this in a simple manner, weight could be reduced by the number of dip wells in each peatland, which results in each peatland being equally weighted.This scheme, however, does not sufficiently use the high information content provided by well-studied large peatlands, which should have a higher impact on model calibration than a small peatland with only few dip wells.
Here, we propose a new weighting scheme that takes into account both factors, peatland size and local density of dip wells, to derive dip well specific weighting factors.It is based on principles of data uncertainty reduction, by repeated measurements, and of geostatistics.First, we consider our data situation as an analogue of meta-analysis with grouped data.It is has been shown for homogeneous problems (all data from same population) that optimal group weights for metaanalysis is 1/SE 2 (Hedges and Olkin, 1985) with SE being the standard error of each group, where σ e is the error standard deviation of a measurement and N is the number of measurements in a group.For homogeneous problems and uniform σ e , this results in weights that are linearly dependent on N , which we here call the first end member of weighting.Heterogeneity (within-group variance) reduces the variation of the group weights which can be shown by random effect models (Cumming, 2012).As with second end member of weighting, when heterogeneity totally dominates within-group variance, optimal group weights are uniform for all groups, i.e., weights are independent of N. We are not aware of a method that allows the estimation of the degree of heterogeneity for the complex target and predictor data situation in this study, including data (spatial and temporal variability, measurement error) and model errors (missing parameters).As a trade-off between 1/SE 2 (homogeneous end member) and 1 (heterogeneous end member), we decided on a group weight that is the inverse of the standard error, 1/SE, which is, for example, often used in econometric studies (Dickens, 1990).We emphasize that this is a subjective decision.The group weight, 1/SE, is the basis for the geostatistical part of our weighting scheme.There are two reasons why we cannot directly treat our peatlands as groups.First, there is within-peatland variability, which is related to changing site characteristics.It is one objective of our study to describe this variability by statistical modeling.Thus, dip wells must be treated individually and data cannot be aggregated at the peatland level.Second, we expect the model to learn more when the same number of dip wells is installed in a larger peatland.In a small peatland, spatial autocorrelation between dip wells is higher, i.e., the information content is lower than for large peatlands.As a consequence of the first point, we do not aggregate and keep all dip wells in the target variable data set by attributing to each dip well the fraction 1/N of its group weight, so that the relative weights of the groups remain constant.As a consequence of the second point, we use principles of geostatistics in our weighting scheme.We replace the group size N (positive integer number) by the "statistical" group size n (positive continuous number being > 1), which we derive from the spatial autocorrelation among the dip wells.
Therefore, we analyze the spatial autocorrelation structure of the data set.A single spherical variogram model was fitted to the sample variogram of all data (Fig. 4 in Sect.3.1).Variogram models allow the differentiation of the total data variance (called "sill") into a spatially uncorrelated variance (called "nugget") and a spatially correlated variance (called "structural variance" and defined as sill-nugget) (Wackernagel, 2003).The variogram model allows for derivation for any distance between two locations the average squared difference of values, here defined as γ .By definition, at distance 0, the average squared difference equals the nugget, and at distances greater than that called the "range" of spatial autocorrelation, the average squared difference equals the sill.Accordingly, the autocorrelated fraction, f , of the average squared difference between two dip wells i and j is We now define the "statistical" group size n of each dip well i to be the sum of one plus the autocorrelated fractions f i,j of all dip wells that are within the range of spatial autocorrelation of i, According to the discussion above, dip-well-specific weights can then be calculated with where n i is derived from Eq. ( 5).The equation shows that with increasing "statistical" group size n, i.e., with increasing spatial data density, the weight of an individual dip well is "down-weighted" to some degree, a behavior that corresponds to our initial intention to lower the influence of small peatlands compared to large ones.The error standard deviation σ e is dependent on several factors, e.g., the length of the time series, the temporal measurement density and the microtopography around the dip well.For simplicity, we here assumed σ e to be uniform for all dip wells, which simplifies Eq. ( 6) to w i = 1 √ n i .Only dip wells with the same land-use type were summed up with Eq. ( 5), which avoids the down-weighting by dip wells that have different land-use types.The latter are mostly characterized by fairly different WL t and thus by rather low spatial autocorrelation to dip well i.
After spatial correlation has been accounted for, the sum of the weights of all dip wells of each land-use type were adjusted that they correspond to the fractions of this land-use type in Germany.This adjustment accounts for the overrepresentation in the data set of dip wells in unused peatlands and underrepresentation of dip wells in arable land.

Model performance criteria
Model fit and predictive performance after cross-validation were quantified by the weighted root mean square error, where m is the number of dip wells, x o,i is observed WL or WL t of dip well i, x s,i is simulated WL or WL t of dip well i and w i is the data weight of dip well i (see below).We refer to the root mean square error of the predicted data of cross validation as RMSE cv .Model performance was further quantified by Nash-Sutcliffe efficiency (NSE), where x o is the mean of all observed WL or WL t .It indicates how well observed vs. predicted values match the 1 : 1 line.NSE is a good overall indicator of predictive performance because it combines scatter and bias (common offset and/or slope difference from 1:1 line) (Nash and Sutcliffe, 1970).
Values greater than 0 signify a model that is better than the reference model based on the data mean.We refer to the NSE of the training data as NSE cal , and of the predicted data of cross validation as NSE cv .Systematic errors were quantified by calculating the model bias, here defined as,

Model uncertainty and stability evaluation
Uncertainty of the model predictions was assessed by bootstrapping, cross-validation and residual analysis.
For the bootstrapping analysis, we followed the procedure of Leathwick et al. (2006).We estimated the confidence intervals around the predictions and the fitted functions by taking 1000 bootstrap samples of the 53 peatlands.The number of peatlands in each sample was equivalent to the data set, but peatlands were selected randomly with replacement.Using the predictor variables of the final model, a BRT model was fitted to each sample.Cross validation was again performed on peatlands, thus a peatland in the calibration data set was not part of the cross-validation data set to avoid overly optimistic results.Variances of the predictions and of the fitted functions of the 1000 models were evaluated.
If data sets are relatively small (e.g., n < 1000, De'ath, 2007) then the small size of the training and test data sets lowers model accuracy.Given the fairly small number of peatlands in the data set and the partly high spatial correlation of dip wells within these peatlands, we decided not to split the data set into a training and test data set.Estimates of model accuracy can then be based on cross-validation, thereby making effective use of all the data (De'ath, 2007).The prediction uncertainty of the final model is estimated by the root mean square error of prediction (RMSE cv , see above) for each land cover class.After testing for nearnormal distribution of the residuals, RMSE cv can be used to derive the 68 and 95 % confidence intervals of the predictions with RMSE cv and 2 × RMSE cv , respectively.

Regionalization
In the final regionalization step, the predictor variables contributing to the final model were determined at a 25 × 25 m raster for all organic soil in Germany.Predictor variables were determined with the same map input that was used for model building.Land cover information including information on ditches was based on the data from year 2012 and the climatic data was based on the average of the last 30 years.The fine spatial resolution of 25 × 25 m was not chosen to fool the reader with a highly spatially accurate model.Rather this fairly fine scale was necessary to map the relatively small-scale effects of the topography, land use and peatland geometry variables.The final model was then used to make a prediction for each of these raster cells.

Spatial correlation structure of the data set
The variogram model fitted to the sample variogram provided a nugget (0.012 m 2 ; 0.11 m), a sill (0.09 m 2 ; 0.3 m) and a range of spatial correlation (2700 m) for our data set of WL (Fig. 4).The nugget represents the very small-scale soil hydraulic variability and micro-topography effects on WL (van der Ploeg et al., 2012) and measurement error, e.g., by differences in the determination of the ground surface and in the timing of the manual measurements.Furthermore, microtopography (e.g., hummocks) and oscillating peat surfaces of wet peatlands pose a challenge for an accurate determination of both ground surface and water level.The water level time series in the data set were of different lengths and ranged from 1 to 20 years.Interannual variability of water levels can be large (e.g., Knotters and van Walsum, 1997).For simplicity, in our analysis, data were not harmonized by extrapolating WL time series using weather data to a 30-year period.Thus, the nugget also includes errors that are introduced by dip wells with different measurement periods that are located in the range of spatial correlation.In consideration of these error sources, the fitted nugget of 0.11 m appears to be a realistic value.At 0.3 m, the fitted sill matched nearly perfectly with the standard deviation of the data (0.31 m), which indicates consistency between semivariogram model and data set.The fitted range of spatial correlation of 2700 m reflects both physical effects, i.e., the average range of lateral flow due to hydraulic gradients, as well as the effect of average land-use patterns in Germany on the spatial correlation of WL.Fitted values were used in the calculation of the dipwell-specific weights using Eq. ( 6).

Typical water levels for land-use types in German organic soils
The land cover classes are characterized by plausible mean and median water levels, which show consistent differences between each other (Table 2 and Fig. 5a).The mean values of arable land and grassland agree with what can be expected for their agronomic requirements, with slightly lower water levels for arable land.The high variability observed for both classes may be related to the variability of the efficiency of installed drainage systems, as for example the presence and condition of tile drains and the depth of ditches.Grasslands can be managed with very variable intensity, which is partly reflected in different water levels.Figure 5a further shows that deciduous forests seem to dominate slightly drier organic soils compared to coniferous forests, which dominate under wetter conditions.A high variability of water levels is observed for the land cover class unused peatland.On the one hand, post peat-cutting topography increases the variability of WL over short distances.It probably contributes to the high variance observed for this class.On the other hand, this class comprises both rather dry unused peatlands and wetter peatlands in which re-wetting measures already took place, which however do not show yet a wet soil attribute in the ATKIS Digital Landscape Model.This may also cause part of the variance observed in the grassland and forest land cover class.All wet land cover classes (reed, wet grassland, wet forest and wet unused peatland) that were separated by wetness indication clearly show higher water levels, showing the wetness attribute of the Digital Landscape Model is a useful attribute.
Figure 5b shows the transformed water level for all classes.It can be observed that the variances of the wetter land cover increase relative to the variances of the dry land cover classes.This is due to the highest sensitivity of GHG emissions in the wet range of water levels (> −0.5 m).Consequently, the rather high variance of WL for arable land corresponds to a rather low variance of WL t , i.e., to a rather low assumed effect of WL variability on the GHG budget.

BRT model calibration and validation: WL vs. WL t
In contrast to land cover class, the other predictor variables showed, if at all, only weak relations to WL and WL t when evaluating them with box plots, 2-D cross plots and simple correlation matrices.Here, we expected BRT to detect the strongest predictor interactions and to identify the most informative predictors.
After model calibration with all predictors, subsequent model simplification successively dropped those parameters with correlation > 0.7 and the lowest contribution.For both, WL and WL t , model performance improved during this simplification.For WL t , the highest values of NSE cv of approximately 0.46 were achieved with 21 to 9 model parameters.shown in Fig. 6.Further elimination of parameters led to a pronounced decline of model performance.Similar behavior was observed for the calibration on WL.In favor of a more parsimonious model, we chose the model with the lowest number of parameters before the pronounced decline of model performance occurred.For the calibration on WL t , this corresponded to the model with lowest number of parameters that still achieved NSE cv values of > 0.45 (Fig. 6).The final WL t model comprised nine predictor variables, and the final WL model, seven parameters.The percentages of parameter contributions to the final model and their individual influences are discussed for WL t in Sect.3.4.Table 3 summarizes the statistical performances of the models calibrated on WL and WL t .For both models NSE cal is considerably higher than NSE cv and shows the commonly  −0.39 ± 0.36 −0.39 ± 0.41 −0.37 ± 0.40 Coniferous f. −0.36 ± 0.36 −0.37 ± 0.37 −0.46 ± 0.35 Wet unused peatl.−0.22 ± 0.27 −0.18 ± 0.40 −0.17 ± 0.36 Wet forest −0.22 ± 0.29 −0.17 ± 0.43 −0.21 ± 0.39 Wet grassland −0.10 ± 0.14 −0.00 ± 0.31 −0.15 ± 0.39 Reed −0.01 ± 0.17 0.20 ± 0.29 −0.06 ± 0.32 observed overfitting behavior of BRT models.The different measures that we conducted to minimize overfitting (crossvalidation on peatlands, restriction to monotonic responses and model simplification including elimination of highly correlated variables) lowered the difference between NSE cal and NSE cv but could not totally avoid overfitting.NSE cv of the WL t model (0.453) indicates higher predictive model performance compared to the WL model (0.381).However, as the data ranges differ due to the transformation, this comparison may be misleading.Therefore, we transformed the predictions of the WL model to obtain WL t values from this model and equally calculated the performance criteria (Table 3, second column).Then, NSE cv is slightly increased (0.397), but does not achieve the values of the model that was calibrated on WL t .A better predictive model performance of the model calibrated on WL t is also visible for the RMSE cv values.The total RMSE cv , as well as the RMSE cv values for the values from the model calibrated on WL.Given our hypothetical transfer function (Fig. 3) in which the GHG budget is linearly related to WL t , the higher accuracy of WL t predictions directly corresponds to a higher accuracy of GHG budget predictions.Superior model performance is also evident when evaluating model bias.Only when calibrating directly on WL t are the WL t predictions bias-free.Calibration on WL and subsequent transformation to WL t introduces a model bias towards systematically lower WL t values.In subsequent applications to GHG emission upscaling, lower WL t values would lead to an overestimation of CO 2 emissions and to an underestimation of CH 4 emissions.

Influence of predictor variables on WL t
Given the beneficial characteristics of the model calibrated on WL t for GHG upscaling, presentation and discussion of further model results is restricted to the WL t model.
The BRT method allows the analysis of the parameter contributions to, and influences on, the model (Elith et al., 2008) and thus may contribute to system understanding.The percentages of the contributions of the nine predictor variables to the final model ranged from 25.2 to 5.6 % (Fig. 7).Except protection status, at least one parameter of each of the seven parameter groups contributed to the final model.All protection status information was dropped early during the simplification process due to low contribution, although WL showed slightly higher values for data from nature protection or special areas of conservation.However, other parameters seem to be able to fully compensate the information that is lost by dropping this predictor.
Land cover class, lc, at the dip well was the parameter with strongest contribution (25.2 %).It basically follows the trend illustrated in Fig. 5b.The bootstrap error plotted as standard deviation (Fig. 7) shows the variation of this influence over the 1000 bootstrap models.A second land cover parameter, the fraction of dry land cover classes on organic soils in a buffer of 2500 m radius, f dry (2500), contributed to the model with 10.3 %.The monotonic decrease of WL t with increasing f dry (2500) is plausible as higher values reflect intensive land use in the surroundings of the dip well and thus indicate intensive artificial drainage.Together both parameter contributed 35.5 % and thus land cover represents the parameter group with the strongest model contribution.
Peatland characteristics are the second most important parameter group.The peatland type contributed 16 %.The model indicates that peatlands without any connection to surface water bodies (river or lake) and the class of other organic soils are characterized by lower WL t compared to the peatland types lowland bog, upland bog and fen neighboring surface water.As the class of other organic soils is generally expected to reflect lower water levels and as surface water may have a stabilization effect on water levels of organic soils, the influence of the peatland type can be considered plausible.Besides peatland type, the substrate of the peat base contributes 5.6 %.Here, organic soils overlying peat clay layers (e.g., limnic sediments such as calcareous gyttja) or basement rock are characterized by higher WL t compared to organic soils overlying unconsolidated rock.This can be explained by the lower drainage resistance of unconsolidated rocks.This may cause an increased efficiency of anthropogenic drainage and/or a general higher vulnerability to seepage losses.Finally, slightly lower WL t values are indicated by a high fraction of organic soils for the 500 m buffer, f peat (500).This may reflect the higher land-use pressure on large peatlands compared to rather small peatlands, which tentatively are more easily preserved by nature protection efforts.
The remaining four parameter groups are represented in the model by only one parameter each.The third most influential parameter was the length of ditches on arable land and grassland for the 250 m buffer, di len,dry (250).At first glance, it may be surprising that with increasing ditch density, WL t values tend to be higher as ditches are supposed to drain the water when land is used as arable land and grassland.The fact that the model identifies a rather strong effect in the opposite direction may be caused by incomplete information about the drainage network.There is not detailed information about the spatial distribution of tile drains.Based on expert knowledge, agricultural areas with a lower ditch density are more likely to have tile drains.As these drains, easily installed with a narrow drain spacing, are more effective at draining organic soils, low WL t values for arable land and grassland may be related to low ditch densities.Furthermore, ditches were originally dug at narrow spacing in especially wet areas of organic soils but there is no information available whether these ditches still function properly.
The parameters wb summer , h rel and ti ras25 all show expected trends.The model predicts higher WL t for increasing climatic water balance in the summer period (May to October), wb summer , for dip wells located in depressions (low values of h rel ) and for higher small-scale topographic wetness indices calculated on the 25 × 25 digital elevation model (ti ras25 ).Further insight into model behavior can be obtained by analyzing parameter interactions.This is obtained by changing two parameters simultaneously while keeping mean values for all other parameters (Elith et al., 2008).Figure 8 shows the two strongest parameter interactions.Parameter wb summer strongly interacts with p type .The generally lower values of WL t of fens without surface water connection and other organic soils show a stronger dependency on the summer climatic water balance.While a summer climatic water balance of > −80 mm shows a rather weak effect on WL t for the wetter peatland types, in contrast to the two drier peatland types there is still a strong effect with increasing wb summer .The trend for wb summer > 130 mm for the dry peatland types is supported by seven different peatlands.Another strong interaction is observed for p base and f dry (2500).While a rather weak effect of the fraction of arable land and grassland is observed for organic soils overlying basement rock and peat clay layer, a strong effect is observed for organic soils overlying unconsolidated rock.This interaction reflects the higher lateral range of drainage effects for organic soils with little flow resistance at the peat base.In these organic soils, intensive land use lowers the water level over large areas.

Discussion of model uncertainty
Plotting observed vs. predicted WL t from cross-validation (Fig. 9) illustrates the rather large residual variance that cannot be explained by the model.As indicated by the higher RMSE cv for the wet range (Table 3), scatter increases with increasing WL t .Error bars in the y direction indicate data error derived from the nugget of the variogram.It is shown for a few data points as an example.Due to transformation, data error increases for higher WL t .Figure 9 demonstrates that the fraction of unexplainable variance related to data error is much higher for the wet than for the dry range.Bootstrap error indicating the variation of the model predictions for 1000 bootstrap samples is shown in the x direction for the same data points.Bootstrap error is lower than the data error for the wet range and slightly higher for the dry range.
Bootstrap errors demonstrate the sensitivity of model predictions to changes of the data set used for calibration.When a model possesses structural deficits such as missing predictor variables, bootstrap errors should not be used to define confidence intervals for the model predictions.The prediction uncertainty derived from cross-validation is much higher than the bootstrap prediction uncertainty obtained from the bootstrap standard deviation (sd), with 2 × sd corresponding to the 95 % confidence interval (Fig. 10).The large difference between these values indicates that the model has structural deficits that can be attributed to several error sources: i. Key influences on WL t are missing in the set of predictor variables.None of the predictor variables indicate whether and to which extent water level increase due to re-wetting measures took place in the last years.Wetness indicators (wet soil and/or vegetation attributes) that are obtained from the Digital Landscape Model probably react with a delay of several years.Thus, we expect the occurrence of several observed high WL t values that cannot be explained by any of the predictor variables.
ii. Small-scale topography that is not represented with sufficient detail and accuracy in the DEM may cause several predictions to strongly differ from what would be expected from the other predictor variables.A common example may be a dip well that is located on a narrow peat ridge, which remained after peat-cutting and is absent in the DEM, and that is situated in an area classified as wet soil by the Digital Landscape Model.Then, the model indicates a WL t that is much higher than the observed WL t ; as for the observed value, the reference surface was the surface of the peat ridge.
iii.Consistent information about tile drains is missing and only exists on the regional scale (Tetzlaff et al., 2009).At the national scale, however, there are no maps on tile drains.Tile drains are known to have a strong effect on WL t for arable land and grassland.As explained above, we expect parameter di len,dry (250) to partially compensate for this missing information.
iv.Another source of prediction uncertainty may comprise inconsistent and erroneous land cover classification of the Digital Landscape Model due to the high degree of subjectivity for many of the attributes.Furthermore, the temporal accuracy of the Digital Landscape Model may be as inaccurate as 5 years which can cause time series with land-use change to be split at the wrong date, and vegetation and wetness attributes to be not yet updated to the current conditions.v.The water balance of fens strongly depends on the size and the hydraulic head of the groundwater catchment, i.e., of the aquifer underlying the peat layer.Unfortunately, there is no consistent map of hydraulic heads or groundwater catchments for all Germany.
We checked model predictions for geographical bias.Geographical location was not one of the model parameters.However, the history and policy of land use on organic soils, current ditch water management and climate do show largescale geographical trends.We divided our data set into the three major German peatland regions (NE, NW and S) and evaluated the model residuals (Fig. 11) to see whether our model is biased due to important missing geographical effects.A serious bias for any of the three major German peatland regions cannot be identified.When applying calibrated statistical models during regionalization, it is important to check model behavior for extrapolation outside the range of the parameter space that is covered by the data upon which the model was built.BRT always extrapolates at a constant value from the most extreme environmental value in the training data.In contrast to other types of statistical models, e.g., generalized linear models, BRT does not continue the fitted trend beyond the last observation.Regarding the categorical variables, the data set covers all classes occurring in Germany with several peatlands.The data set also covers the major range of values occurring in Germany for the numerical predictor variables.Furthermore, Fig. 7 indicates that the constant values, at which the model extrapolates the influence of the variables, do not raise major concern for any extreme predictions outside the parameter range.

Regionalization
The map of WL t resulting from the application of the fitted WL t model to all grid cells shows gradients at the regional scale (Fig. 12a).In the south of Germany, for example, a gradient from wet to dry can be observed for the pre-alpine upland bogs and the peatlands of the moraine plain.In the north of Germany, the map indicates that organic soils in the very NE are wetter than the rest.For the rest of the north, a slight gradient can be observed from less dry to dry from NW to E, which is mainly driven by the higher summer climatic water balance in the NW.As both categorical and numerical predictor variables do also vary at the sub-regional scale, the resulting map also shows gradients within peatland areas, e.g., due to small-scale land-use ditch density gradients and topography effects (Fig. 12b).
We calculated WL t averages of the land cover classes using the regionalized WL t from the map (Table 2, column 3).The given standard deviation comprises both the variability within a land cover class that is explained by the model as well as the uncertainty of each prediction.Resulting means and standard deviations slightly differ from the corresponding values of the data set.The land-cover-specific WL t values obtained from the map can be considered as being more representative as the regionalization procedure is supposed to partly account for potential bias in the data set.
When applying this map and its predicted WL t values in subsequent GHG upscaling, it is crucial that model uncertainty is propagated properly.An example demonstrates the necessity of uncertainty propagation.For a grid cell classified as wet grassland, the probability distribution of WL t is shown based on a normal distribution that was fitted to the residuals of this land cover class (Fig. 12c).Without propagating the uncertainty and when only translating the predicted WL t (eventually in combination with other parameters, e.g., soil properties) into a GHG budget, GHG budget is strongly underestimated as the WL t prediction is close to zero, indicating neither large CO 2 nor CH 4 emissions.When translating the full distribution of WL t into a GHG budget, the resulting GHG budget would be much higher as at both sides of the predicted WL t , the GHG budget increases.

Possible paths for model improvement
The model performance that is achieved by the statistical approach presented in our study raises the question whether collecting more WL data can improve model performance or whether the factor that is constraining the model performance is the limited strength of the nation-wide available predictor variables.To assess this question, additional "holdout models" were developed by fitting the BRT model to various random sets of data with a limited number of peatland areas (from 10 to 50 peatlands).For each number of peatland areas, 500 random selections were calibrated and model performance was evaluated with NSE cv .As expected, results indicate an increase of model performance with increasing number of peatlands used in the model building process (Fig. 13).Results also indicate a substantial flattening of the learning curve.Thus, further collection of WL data may only lead to a substantial model improvement when including many more peatlands into the data set.More promising would be the specific collection of more data on the weakly represented and/or important land cover classes, arable land and grassland.
Another path to achieve a stronger model is the development of new predictor variables.In the future, the availability of a more accurate DEM based on laser-scanning data, which is already available at full coverage for some federal states of Germany, may strongly increase the predictability of the observed WL data.Additionally, a nation-wide map of water management and of the distribution of tile drains would have great potential to explain large parts of the residual variance and/or even allow setting up a large-scale physically based model that includes water management.Furthermore, data harmonization by extrapolating the water level time series of our data set with the climatic boundary conditions of the last 30 years may lower the unexplainable variance of the data set due to short measurement periods (Bartholomeus et al., 2008), an effort that has been successfully conducted in Finke et al. (2004) using the transfer noise model of Bierkens et al. (1999).Finally, we believe that the inclusion of remote sensing products in our statistical model approach, as e.g., spaceborne microwave soil moisture observations (Sutanudjaja et al., 2013), may hold large potential to improve model performance as moisture differences due to varying water levels are high for organic soils.

Conclusions
Our study demonstrates the potential of statistical modeling for the regionalization of water levels in organic soils when data covers only a small fraction of peatlands of the final map and thus spatial interpolation is not possible.With the available data set of target and predictor variables, it was possible to predict 45 % of the GHG relevant water level variance in the data set in a cross-validation scheme.The variance is explained by nine predictor variables.With the analysis of their effect on the water level it was possible to gain insight into natural and anthropogenic boundary conditions that control water levels of organic soils in Germany.
Based on a hypothetical GHG transfer function relating GHG emissions to annual mean water levels (WL), we showed the advantages of transforming the annual mean water level into a new variable (WL t ) to which GHG emissions linearly depend on.The transformation improved model accuracy, increased the explained variance of the water level range that is relevant for GHG emissions and avoided model bias.
The presented approach is transparent and allows successive improvement when new input data and predictor variables become available.Our results show that model improvement by increasing number of WL t data, however, seems to be limited.If efforts are made, data collection should be concentrated on agriculturally used organic soils for which relatively few data is available.We believe that the constraining factor of model performance is rather the weakness of the predictor variables that are currently available at large scales.The development of new, more informative predictor variables, as for example water management maps and remote sensing products, may be the more promising path for model improvement.
The proposed regionalization approach is suited to application to any other country where similar data of target and predictor variables is available.It is important that the spatial resolution of the predictor variables is high enough (Finke et al., 2004).If predictor variables like land use and peatland type are only available at a much coarser scale and provided as percentages for grid cells, the dependency between predictor variables and the rather local WL will probably be lost for most of the predictor variables.
Our work must be considered as one piece of a broader framework for the regionalization of GHG emissions that includes other site characteristics and must be further developed in future research.For example, if for specific regions detailed information on peat properties becomes available and its effect on GHG emissions can be estimated by the use of multivariate transfer functions, the map of transformed water levels (WL t ) can be used as an input for this follow-up regionalization.

Figure 1 .
Figure 1.Locations of the 1094 dip wells of the data set.Base map (geological map 1 : 200 000, BGR) shows the distribution of bog and fen peat, and other organic soils.

Figure 2 .
Figure2.Illustration of the predictor variables determined for each dip well based on available national maps (see Table1).
Figure2.Illustration of the predictor variables determined for each dip well based on available national maps (see Table1).

Figure 3 .
Figure 3. Illustration of the annual mean water level (WL) transformation.Hypothetical transfer function relating GHG budget to WL (m) (a).GHG budget vs. the transformed water level (WL t ) (b).WL t vs. WL.The lines along the x axes indicate the data quantiles of the analyzed data set (c).

Figure 4 .
Figure 4. Sample semivariogram and fitted semivariogram model of the annual mean water level data, WL.

Figure 5 .
Figure 5. Water level relative to ground surface, WL (m), and transformed water level, WL t (−), by land cover class illustrated as a weighted box plot.WL t = −1 corresponds to maximum CO 2 emissions and WL t = 1 to maximum CH 4 emissions.In the top horizontal axes, the number of dip wells in each class is indicated.

Figure 6 .
Figure 6.NSE cv as a function of number of predictor variables used in the model of WL t during model simplification and shown for the last 50 parameter drops.

Figure 7 .
Figure7.Partial dependence plots for the predictor variables.For an explanation of variables see Table1.The y axes are on WL t scale and are centered around the mean WL t .Error bars and grey area indicate standard deviation of the response of over 1000 bootstrap models.The relative contribution of each predictor is indicated as percentage.The lines along the x axes of each plot show distribution of data across that variable, in deciles.
Figure7.Partial dependence plots for the predictor variables.For an explanation of variables see Table1.The y axes are on WL t scale and are centered around the mean WL t .Error bars and grey area indicate standard deviation of the response of over 1000 bootstrap models.The relative contribution of each predictor is indicated as percentage.The lines along the x axes of each plot show distribution of data across that variable, in deciles.

Figure 8 .
Figure 8. Partial dependence plots representing the two strongest interactions in the model: (a) between p type and wb summer and (b) between p base and f dry .Fitted WL t is plotted on the y axis which is obtained after accounting for the average effect of all other predictor variables.
Figure 10 shows residuals from cross-validation and standard deviation of bootstrap predictions for all land cover classes.The residuals of each land cover class show near-normal distributions.For five of the nine land cover classes (wet forest, wet unused peatland, arable land, coniferous forest and reed), the Shapiro-Wilk test of normality is positive (p > 0.05).Figure 10a further indicates that residuals of each land cover

Figure 9 .
Figure 9. Observed vs. predicted transformed annual mean water level (WL t ) from cross-validation results.Error bars show selected data and bootstrap model errors as standard deviation.Data points are scaled by their weights.

Figure 10 .
Figure 10.(a) Residuals (observation-prediction) of WL t predictions and (b) standard deviation (sd) of bootstrap predictions shown for the nine land cover classes.In the upper part, the number of dip wells in each class is indicated.

Figure 11 .
Figure 11.Residuals (observation-prediction) of WL t predictions for the three major geographical peatland regions of Germany.In the upper part, the number of dip wells in each class is indicated.

Figure 12 .
Figure 12.Map of predictions of transformed annual mean water level (WL t ) for all German organic soils (a) and an enlarged map section (b).Probability distribution in (c) indicates the uncertainty of a specific point prediction for wet grassland as an example.Here, predicted value is approximately WL t = 0, but note that wet grassland predictions do vary in space depending on the values of the other model parameters.The histogram shows the residuals from cross-validation for wet grassland, to which the probability distribution was fitted.

Figure 13 .
Figure 13.NSE of cross-validation vs. number of randomly selected peatland areas.Dashed lines indicate NSE cv ± sd.

Table 1 .
Overview on predictor variables.

Table 2 .
Weighted mean and standard deviation of WL and WL t data, and of the WL t map presented in Sect.3.6, for the nine land cover classes.

Table 3 .
Performance criteria of the different models; dry range defined as WL < −0.3 m and wet range as WL > −0.3 m.