Inter-comparison of energy balance and hydrological models for land surface energy flux estimation over a whole river catchment

Evapotranspiration (ET) is the main link between the natural water cycle and the land surface energy budget. Therefore water-balance and energy-balance approaches are two of the main methodologies for modelling this process. The water-balance approach is usually implemented as a complex, distributed hydrological model, while the energybalance approach is often used with remotely sensed observations of, for example, the land surface temperature (LST) and the state of the vegetation. In this study we compare the catchment-scale output of two remote sensing models based on the two-source energy-balance (TSEB) scheme, against a hydrological model, MIKE SHE, calibrated over the Skjern river catchment in western Denmark. The three models utilize different primary inputs to estimate ET (LST from different satellites in the case of remote sensing models and modelled soil moisture and heat flux in the case of the MIKE SHE ET module). However, all three of them use the same ancillary data (meteorological measurements, land cover type and leaf area index, etc.) and produce output at similar spatial resolution (1 km for the TSEB models, 500 m for MIKE SHE). The comparison is performed on the spatial patterns of the fluxes present within the catchment area as well as on temporal patterns on the whole catchment scale in 8-year long time series. The results show that the spatial patterns of latent heat flux produced by the remote sensing models are more similar to each other than to the fluxes produced by MIKE SHE. The temporal patterns produced by the remote sensing and hydrological models are quite highly correlated (r ≈ 0.8). This indicates potential benefits to the hydrological modelling community of integrating spatial information derived through remote sensing methodology (contained in the ET maps derived with the energy-balance models, satellite based LST or another source) into the hydrological models. How this could be achieved and how to evaluate the improvements, or lack of thereof, is still an open research question.


Introduction
Evapotranspiration (ET) acts as a coupling between two of the most important natural processes affecting the land surface: the water (mass) exchange and the energy exchange (Campbell and Norman, 1998).Therefore it has a strong impact on, and is impacted by, plant biophysics, weather and climate, and is an important component when modelling those processes.At the same time the knowledge of both the magnitude of water loss from the ground through evapotranspiration and spatial distribution of this flux has many practical applications, such as in agri-and aqua-culture, water resource management or drought monitoring (Anderson et al., 2012).This has led to an active interest from the research community in the spatially distributed modelling of evapotranspiration and to the development of a number of different methodologies.Two of the most common approaches are (1) the modelling of land surface energy fluxes, mostly with the use of land surface temperature (LST) maps derived from remote sensing observations, and (2) distributed physically based hydrological models.

R. Guzinski et al.: Inter-comparison of energy balance and hydrological models in a river catchment
The two types of modelling approaches have been compared previously, for example recently by Conradt et al. (2013) who compared ET patterns produced by hydrological model, remote sensing based model and ground measurements in sub-basins of the Elbe River.That paper concluded with a recommendation of further comparison studies of the different modelling approaches, especially using more than two independent models, to better understand their relative strengths and weaknesses.This should lead to improved model performance, but also to increased understanding of the errors (and their magnitudes) present in the different models, which is particularly important if the approaches are to be combined through, for example, data assimilation.Without the knowledge of errors, assimilating remotely sensed ET into hydrological models might not provide its full benefit.Pan et al. (2008), for example, found that assimilating ET derived with remote sensing observations into their hydrological model did not have a large impact on the modelled water budget since it was assumed that the calibrated hydrological model provided much more accurate ET values than the satellite observation based model.Schuurmans et al. (2011) found that although spatial patterns produced by the hydrological model became more realistic after the assimilation of satellite based ET, a major weakness was the lack of information about the standard error present in the ET estimates and lack of independent spatially distributed ET data set that could be used for validation.Therefore, in this study we compare two remote sensing based ET models and a hydrological model, with the aim of improving the understanding of their limitations and providing information that could be used in potential future integration of the approaches through data assimilation.
The remote sensing models of evapotranspiration (Kalma et al., 2008) aspire to minimize the calibration of site-specific parameters and the usage of locally derived data.Instead they aim to be applicable in a wide number of climatic and vegetation conditions without any major modifications, and to rely mostly on data acquired through satellite observations (e.g.LST) or regional-scale modelling (e.g.air temperature).This necessitates a number of assumptions and simplifications, which might lead to reduced accuracy of the modelled fluxes.Another feature of the remote sensing models is the treatment of each pixel within the modelling domain as a stand-alone sub-domain without any connections or interactions with the surrounding pixels.In some approaches, such as the triangle approach (see below), some of the model parameters are derived through common analysis of all the pixels in the domain, but the fluxes in the individual pixels are still derived individually.Similarly, the remote sensing models consider each satellite image as a stand-alone snapshot of the land surface conditions, with no memory of the past.
There are a number of remote sensing modelling methodologies being actively used by the research community ranging from simpler, empirical ones to more complex, physically based ones.One of the simpler approaches consist of the so-called "triangle" models, named after the shape resulting from plotting the pixel values of an LST map against pixel values of a vegetation index map.The evaporation fraction can then be derived by interpolating between the edges of the triangle (Jiang and Islam, 2001;Stisen et al., 2008).More advanced schemes characterize the ground surface as one layer (soil and vegetation combined) in one-source models (e.g.Surface Energy Balance System model, Su, 2002), or as two layers (soil and vegetation separately) in two-source models, the majority of which follow the two-source energy-balance (TSEB) modelling scheme (Norman et al., 1995).Both the one-source and two-source models characterize the fluxes of heat and moisture between the surface and the atmosphere in terms of a set of resistance equations, formulated from physically based models of boundary layer behaviour under different atmospheric conditions and vegetation covers.The twosource models have the advantage of explicitly representing the separate contributions of the soil and the vegetation, thus avoiding the need for parametrization of an "excess" resistance term whose value differs significantly from one reference to another (Norman et al., 1995;Matsushima, 2005;Kustas and Anderson, 2009;Boulet et al., 2012).
The distributed physically based hydrological models, in contrast to the remote sensing models, are heavily parametrized and calibrated for each individual catchment or study area (Refsgaard, 1997).Besides evapotranspiration, and other land surface fluxes, they can model a host of other hydrological processes such as channel flow, unsaturated zone flow or ground water flow and the interactions between those processes (Graham and Butts, 2005).This means that the modelling is performed in four dimensions (latitude, longitude, elevation or depth, and time) and that there is an interaction between the model pixels in both time and space.Due to their complex nature the hydrological models require significant computational resources and a large number of inputs for calibration and operation.During the operational stage the models require gridded meteorological input (including rainfall, air temperature and humidity), gridded soil hydraulic parameters and a digital elevation model among others (Stisen et al., 2011a).During the calibration stage (prior to the operational stage) additional information is required, for example measured hydrological parameters such as hydraulic head or stream outflow.
Since the hydrological models are calibrated using detailed hydrological observations, it is our hypothesis that the catchment-wide evapotranspiration estimated by those models is more accurate than the one derived with remote sensing models.On the other hand, we expect the energy-balance models driven by remote sensing observations to better represent the spatial patterns of the fluxes present within the catchment.We evaluate this hypothesis by running a hydrological model, MIKE SHE, described in Sect.4.1, and two TSEB based models, Dual-Temperature-Difference (DTD - Norman et al., 2000;Guzinski et al., 2013) and TSEB-2 An-gle Radiative Transfer (TSEB- 2ART -Nieto et al., 2013), described in Sect.4.2, over the Skjern river catchment located in western Denmark (see Sect. 2).Apart from being the largest river in Denmark, Skjern is also the study area of the Danish hydrological observatory HOBE (Jensen and Illangasekare, 2011), thus providing ample data for calibrating and validating the models.The three models are run with the same meteorological inputs, interpolated from field based observations, and the same land cover and leaf area index (LAI) maps, in order to minimize the uncertainties inherent in using different data sets.The differences between the models, and their input data, are described in Sect. 4.
The output land surface fluxes, and in particular the latent heat flux, from the three models are then inter-compared.The comparison is performed on a pixel-by-pixel basis as well as on catchment scale, and both systematic and unsystematic differences are analysed (Ji and Gallo, 2006).In addition, the catchment-scale temporal evolution of the evapotranspiration estimated from the three models is evaluated.Through this, we assess strengths and weaknesses of the different modelling approaches and in particular try to answer the following question: is there any additional (spatial) information present in inputs or outputs of remote sensing based surface energy-balance models that is missing from the physically based distributed hydrological model?

Study area
The study area covers the Skjern River catchment (Fig. 1) which is located on the western part of Denmark's Jutland peninsula and it is the largest river catchment in Denmark in terms of water volume.It has a roughly rectangular shape with the east-west length of around 60 km and north-south length of around 40 km.The Skjern River outlet is on the western side of the catchment with the discharge entering Ringkøbing Fjord.The terrain is mostly flat with a maximum elevation of 125 m a.s.l. and a gentle east to west slope.The soils are predominantly sandy with the main land use being agriculture and coniferous plantations.The catchment experiences a temperate maritime climate, with mean annual precipitation of 990 mm and mean annual temperature of 8.2 • C. Since 2007 the catchment is hosting the Danish Hydrological Observatory, HOBE, with numerous experiments and measurements concerning precipitation, evapotranspiration, greenhouse gas exchange, groundwater-surface water interactions and other related topics, making it highly suitable for calibrating and evaluating the distributed physically based hydrological models.For more details refer to Jensen and Illangasekare (2011).

Common model inputs
In order to compare the performance of the three models and not the accuracy of their inputs, the models used the same auxiliary input data whenever possible.Those common inputs consisted of maps with meteorological forcings, LAI, albedo and land cover types.For the meteorological forcing data, kriged fields of wind and temperature corrected precipitation from 43 rain gauges were used (Stisen et al., 2011b) together with air temperature, relative humidity, incoming shortwave radiation, wind speed and pressure interpolated from 16 climate stations.The locations of the rain gauges and climate stations in relation to the study area are presented in Fig. 1.The vegetation-related inputs were derived using remote sensing data with LAI estimated from MODIS NDVI (MOD13A1 product) following the study of Boegh et al. (2009) and albedo estimated from narrow band MODIS reflectance following Liang (2001).It should be noted that albedo maps were only shared by MIKE SHE and DTD, with TSEB-2ART producing its own albedo maps as one of the outputs.A land cover map was taken from the Danish Areal Information System run by the Danish Ministry of Environment (http: //www2.dmu.dk/1_Viden/2_Miljoe-tilstand/3_samfund/AIS/1a_Dynamisk_gis/Image_viewer/AAK_IMS_en.htm, last access: 29 January 2013), with the land cover dependent parameters listed in Table 1.The LAI correction factor, mentioned in the last row of Table 1, was derived during the calibration of the MIKE SHE model and is used as multiplicative factor for LAI estimated from MODIS NDVI for all land cover classes except for forests.Even though this is a MIKE SHE-specific parameter, it was also applied to LAI inputs to DTD and TSEB-2ART to ensure that comparable input data were used in all the models.
All common input data maps were delivered in UTM32-WGS84 projection.The LST observations used by the different models, as well as data used only by a single model, are described in the sections below.et al. (2011a) and Overgaard and Rosbjerg (2005).Briefly, the model couples ground-water and surface-water modules together with an ET module (Overgaard and Rosbjerg, 2005).
The SW-ET module, based on the two-source model of Shuttleworth and Wallace (1985), uses hydrological modules' outputs of soil moisture, soil heat flux and fraction of soil and leaf covered by ponded water.Besides these parameters, meteorological observations of air temperature and humidity, wind speed and incoming shortwave radiation and maps of albedo, LAI and land cover are used to solve a set of 10 linear equations for the temperature and humidity of dry and wet soil, dry and wet leaf and inter-canopy air (see Appendices A and D in Overgaard and Rosbjerg, 2005, for more details).With those parameters it is possible to estimate the effective soil and leaf temperatures as well as the radiometric surface temperature (LST) and the latent and sensible heat fluxes.Since the model simulates LST, it is possible to calibrate the model against remotely sensed LST in addition to hydrological variables such as hydraulic head or stream outflow.The model used in this study was calibrated for the Skjern river catchment against the above mentioned hydrological variables, LST taken from the MYD11A1 Aqua-MODIS product, evapotranspiration measured at three flux tower sites placed within the catchment area and soil moisture measurements from a distributed sensor network.The calibration methodology will be a topic of a subsequent paper.
As input the model requires gridded meteorological forcing data, soil hydraulic parameters and a number of parameters related to vegetation.The meteorological forcing data, LAI, albedo and land use maps are described in Sect.3. The soil hydraulic parameters came from a study of Greve et al. (2007).The derivation of other input parameters, such as soil surface roughness or irrigation water input, is described in Stisen et al. (2011a).The MIKE SHE SW-ET model, from now on referred to as MIKE SHE, was run at 500 m resolu-tion, and as output provided the surface energy fluxes (sensible, latent and ground heat fluxes and net radiation) together with LST and soil and canopy temperatures, T S and T C respectively.The outputs were bilinearly interpolated from 500 m to 1 km to match the resolution of the outputs from both remote sensing models.

TSEB modelling scheme
The TSEB approach (Norman et al., 1995) splits the observed directional LST into its two main components, namely the temperature of soil and canopy: where T R (θ ) is the LST observed at the view zenith angle (VZA) of θ and f (θ ) is the fraction of vegetation cover in the field of view of the sensor at VZAθ .This allows the model to estimate the latent and sensible heat fluxes from the soil and canopy separately, thus avoiding the need to parametrize the "excess" resistance term which is often present in singlesource models but for which there does not yet exist an established methodology for estimating its value (e.g.Matsushima, 2005;Boulet et al., 2012).
In the single-angle TSEB models, the latent heat flux of the canopy, LE C , is initially estimated using the assumption that the canopy is transpiring at the potential rate dictated by the divergence of net radiation in the canopy, R n,C , and a modified Priestly-Taylor approach.This allows an initial estimation of the sensible heat flux of the canopy, H C , and of T C .If the model returns unrealistic results (LE < 0 meaning condensation during daytime) the transpiration of the canopy can be iteratively reduced until realistic results are obtained (Norman et al., 1995).
In the dual-angle TSEB models, T S and T C can be derived directly from the observation geometry, followed by H S and H C and finally LE C as residual of the canopy energy balance.In both cases the total energy balance is ensured by estimating the latent heat flux from the soil, LE S , as residual: The two TSEB based models used in this study follow the principles described above but differ in other implementation details as described in the subsections below.

DTD model
The DTD model minimizes the influence of systematic error in the retrievals of LST and air temperature by replacing absolute temperature measurements with temperature change between two observations (Norman et al., 2000).In the original DTD model the first observations was early in the morning, when fluxes are minimal, and the second later in the morning or in the afternoon.Guzinski et al. (2013) demonstrated that replacing the early morning observations with night-time ones does not have a significant effect on the accuracy of the modelled fluxes, thereby facilitating the use of polar orbiting satellites with day and night overpasses, and introduced a simple scheme for accounting for vegetation phenology when estimating canopy transpiration.The model was further modified in Guzinski et al. (2014) where the resistance network to sensible heat flux was modified, to the so-called "series" configuration, to explicitly consider the in-canopy air temperature, thus improving the model performance during dry conditions.The DTD model formulation used in this study is as described in the Appendices of Guzinski et al. (2014), with the exception of the formulation of the resistance to heat transfer from the soil surface, R S .
The R S formulation used in the TSEB modelling scheme accounts for turbulent transport from free convection (Kustas and Norman, 1999): where c and b are constants given a value of 0.0025 m s −1 K −1/3 and 0.012 m s −1 respectively and u S is the wind speed just above the soil surface.However, since DTD aspires to use just time differential temperature measurements, it was originally decided to remove the (T S − T C ) 1/3 term from the resistance equation and instead to replace it with a LAI-dependent constant.For dense canopies T S − T C was assumed to be 5 K, while for sparse canopies it was assumed to be 15 K (Norman et al., 2000).Those assumptions made sense for the data sets used to evaluate the model performance, taken in New Mexico over June and July 1997 (http://hydrolab.arsusda.gov/sgp97,last accessed 27 February 2014) and in Arizona from June to September 1990 (Kustas and Goodrich, 1994).However, in the current study area, dominated by croplands located in temperate maritime climate (see Sect. 2), sparse canopy conditions are usually present in early spring and autumn when the difference between T S to T C is significantly less than 15 K. Therefore the R S formulation was further modified to make use of the difference in thermal inertia of LST and air temperature: where subscript 0 indicates temperatures estimate at night or early in the morning, and 1 indicates estimate at some other time during the day, and b and c have the same values as shown above.Thus, the use of both non-time differential temperature estimates and the assumptions about the magnitude of T S −T C are avoided.This formulation implicitly takes into account the amount of vegetation cover, since vegetation has larger thermal inertia than soil and thus (T R,1 −T R,0 ) is lower for dense canopies, while also reflecting the climatic conditions present in the study area.
The model uses MODIS LST estimates from the MYD11A1 product, together with land cover, LAI and albedo values derived as described in Sect. 3 and vegetation indices (normalized difference vegetation index and enhanced vegetation index) from the MOD13A2 product for estimating the fraction of vegetation that is green (Guzinski et al., 2013).The meteorological inputs are also as described in Sect.3. The MODIS LST and vegetation indices products were provided by NASA in a georeferenced grid with 930 m resolution and Sinusoidal projection.This was bilinear resampled to 1 km resolution grid and reprojected to UTM32-WGS84 projection.The modelled fluxes are output at 1 km resolution.

TSEB-2ART model
When TSEB is applied with single-angle LST, some assumptions are needed based on the expectation that plants transpire at their potential rate.This assumption may lead to significant errors in cases when plants are stressed, or when the potential canopy transpiration is not well defined.For that reason, the green fraction of vegetation (f g ) is an important parameter within the model since it improves TSEB accuracy in forested ecosystems and during senescence by taking into account the phenological development of the vegetation (Guzinski et al., 2013;Chirouze et al., 2014).However, there does not yet exist an established method of estimating f g using remote sensing data.To overcome this issue, dual-angle LST can be used for retrieving soil and canopy temperatures without employing any assumptions based on the canopy transpiration (Chehbouni et al., 2001;Kustas and Norman, 1997;Nieto et al., 2010a, b).Simple models for such retrieval have been proposed based on the proportion of vegetation and soil observed at two different viewing angles (Chehbouni et al., 2001;Kustas and Norman, 1997).However, since plant canopies are composed of finite leaves, multiple scattering of energy occurs within the canopy and therefore more physically complex methods for retrieving soil and canopy temperatures may be needed when using dual-angle LST measurements (François, 2002;Nieto et al., 2013).

R. Guzinski et al.: Inter-comparison of energy balance and hydrological models in a river catchment
The TSEB-2ART model (Nieto et al., 2013) couples a dual-angle version of TSEB introduced by Kustas and Norman (1997), with radiative transfer model (RTM) 4SAIL (Verhoef et al., 2007).Through this coupling it is possible to retrieve canopy and soil temperatures by analytically inverting the RTM with LST estimates of the same area but obtained through two different view zenith angles.4SAIL takes into account the different emissivities of the end members (canopy and soil) and hence multiple scattering of the thermal radiation, as well as the downwelling longwave radiation reflected by the surface.Therefore, the coupling should result in more accurate temperature retrievals compared to just using the geometric configuration of the observations (Nieto et al., 2013).Similarly to other TSEB based models, the canopy and soil temperatures are then used by TSEB-2ART to estimate the sensible heat flux of the canopy and soil respectively.In addition the RTM is used to estimate the net radiation, and radiation divergence in the canopy, while taking into account multiple scattering of the shortwave/longwave radiation between the soil and the canopy and within the canopy.The inclusion of 4SAIL also allows for the use of different leaf inclination distribution functions, rather than the spherical leaf distribution of the original TSEB (Norman et al., 1995;Kustas and Norman, 1999).Ground heat flux is estimated as a fraction of net radiation reaching the soil based on Choudhury et al. (1987).Finally, transpiration and soil evaporation can be obtained as residual terms of the vegetation and soil energy budgets.
TSEB-2ART has been evaluated over three flux tower sites within the HOBE area, obtaining more accurate flux retrievals than both the original dual-angle (Kustas and Norman, 1997) and the single-angle TSEB (Norman et al., 1995) implementations when driven by LST estimates derived with the AATSR sensor on board the Envisat satellite (Nieto et al., 2013).Even though the Envisat satellite is no longer functional, the model can be applied to the dual-angle LST observations in the future Sentinel 3 mission (Donlon et al., 2012).Apart from the AATSR derived LST the model uses the same meteorological data as well as land cover and LAI maps as MIKE SHE and DTD models but produces its own albedo as part of the implementation of 4SAIL.The Envisat LST was derived from the ATS_TOA_1P, AATSR Gridded Brightness Temperature and Radiance, product, which is a full resolution data set resampled to a 1 km×1 km grid for both the nadir and forward views by the European Space Agency (Scarpino and Cardaci, 2009).The split-window brightness temperatures (11 and 12 µm) for both forward and nadir were then reprojected to UTM32-WGS84 and resampled to the same 1 km resolution using a bilinear interpolation.LST at the two AATSR observation angles was then retrieved by the quadratic dual-channel split-window algorithm proposed by Coll et al. (2006) for AATSR.The modelled fluxes are output at 1 km resolution.

Comparison methodology
The spatial comparison was performed by selecting all the pixels in the Skjern catchment on all the days between 2003 and 2010 when at least 10 % of the catchment was cloud free during the night and day Aqua overpasses and which met the following conditions: -the pixel is not classified as water or urban area (met by 96 % of the catchment area); -all three models produce valid results, meaning LE > 0 W m −2 and H ≥ −100 W m −2 (met by 85 % of modelled fluxes).
This resulted in over 95 000 sets to be compared.A median moving-window filter of 3 × 3 pixels was applied to the output maps to reduce noise caused by image misregistration while preserving the spatial patterns found in the maps.
The comparison was performed using the instantaneous modelled sensible heat flux, latent heat flux and available energy (AE) defined as the net radiation minus the ground heat flux.The magnitude of those fluxes is strongly influenced by the incoming solar radiation and so it has a cyclic annual component with generally larger fluxes during the summer months and lower during the winter months.This could potentially influence the correlation between the fluxes modelled with different models.To remove this time dependent component and instead to evaluate the influence of water availability on the partitioning of the available energy into latent and sensible heat fluxes by the different models, the evaporative fraction (EF), defined as the ratio of energy used for evapotranspiration to the total available energy, was also used during the comparison.
When comparing the fluxes estimated by the three different models the time at which the fluxes are estimated must be taken into account.The TSEB-2ART fluxes are estimated at the time of the Envisat overpass, which is around 11:30 local time (LT), while the DTD fluxes are estimated at the time of Aqua overpass, around 12:00-13:00 LT.The MIKE SHE fluxes are estimated at hourly intervals throughout the day.Therefore, when comparing the fluxes between MIKE SHE and one of the satellite based models a linear interpolation was performed between the two MIKE SHE estimates bracketing the satellite based estimate (e.g. if satellite overpass was at 11:48, MIKE SHE estimates from 11:00 and 12:00 would be used).When comparing the fluxes from two satellite based models there is an offset present due to this time difference, although it should be reduced when comparing EF (Peng et al., 2013).A decision was made to perform the comparison using instantaneous modelled fluxes, and not their daily estimates, since extrapolating to daily values would just involve multiplying EF by the daily available energy (or net radiation assuming negligible daily G).Therefore the multiplicative factor would be the same (if field-measured daily available energy were used) or very similar (if modelled daily available energy were used) for the three models and no additional A number of statistical measures are used to explore the relation between the fluxes, and temperatures, estimated by the three models.The first one is the Pearson correlation coefficient, r, which measures the linear covariation of two data sets.To assess the differences between the data sets the root mean square difference (RMSD), which is the square root of the mean square difference (MSD), is used: where X i and Y i are the ith points in the X and Y data sets.
The MSD can be further split into systematic and unsystematic mean product differences (MPD), MPD s and MPD u respectively, where MPD s measures the distance between the geometric mean regression line (Barker et al., 1988) of two data sets and the 1-1 line, while MPD u measures the distance between the data sets' points and the geometric mean regression line (Ji and Gallo, 2006).The geometric mean regression is used instead of the linear regression since the former one assumes that both X and Y are subject to errors.Since MSD = MPD s + MPD u it is also possible to calculate the relative contribution of the systematic and unsystematic difference to the total difference as MPD s / MSD and MPD u / MSD respectively (Ji and Gallo, 2006).The systematic component of the difference represents the variation between the data sets that can be corrected by simple linear transformation of one of the data sets, while the unsystematic difference can be thought of as noise caused by some unknown factors which is harder to correct for (Ji et al., 2008).For presentation purposes a square root is taken of MPD s and MPD u to obtain RMPD s and RMPD u respectively which are then shown in the results' tables.The last statistical measure used is the mean bias, calculated as the difference between the means of two data sets.With the exception of the sign of the bias, all the statistical measures are symmetric, meaning that no assumption is made about the correctness or otherwise of any of the data sets and that the same values are obtained if the order of the data sets is reversed when calculating the measures.
The temporal patterns of evapotranspiration were evaluated at catchment scale, meaning that all the valid non-urban and non-water pixels within the catchment were averaged to determine the catchment-scale fluxes.It should be noted that since MIKE SHE also simulates the fluxes over water and urban pixels, this average is not the whole catchment evapotranspiration as modelled by MIKE SHE.However, since the number of water and urban pixels is quite small (Fig. 1), the averaged value should be close to the whole catchment evapotranspiration.Only those days on which the remote sensing models produced flux estimates in pixels representing at least 70 % of all non-urban and non-water catchment pixels were included in the analysis.In the case of DTD this condition was satisfied on 132 days over the 8-year period, while in the case of TSEB-2ART there were 68 valid days due to the less frequent revisit time of AATSR vs. MODIS.The catchment averages for each date were produced using the same set of pixels for the remote sensing models and MIKE SHE.The two data sets were compared using the r correlation coefficient, RMSD and bias and the ratio of RMSD and bias to the mean evapotranspiration estimated by MIKE SHE.

Spatial patterns
The results of pixel-to-pixel comparisons of fluxes between the three model pairs are presented in Figs. 2 (MIKE SHE-DTD), 3 (MIKE SHE-TSEB-2ART), and 4 (TSEB-2ART-DTD) with statistics summarized in Table 2 and described for each model pair in the subsections below.

MIKE SHE vs. DTD
The bias between the turbulent fluxes modelled with MIKE SHE and DTD is significant with a value of 19 W m −2 and −45 W m −2 for H and LE respectively.The RMSD is also quite large, at 78 W m −2 for H and 106 W m −2 for LE, and consequently the correlation coefficient between the modelled turbulent fluxes is relatively low, with a maximum value of 0.56.When the differences are split into systematic and unsystematic parts, 89 % of the error in H and 81 % of error in LE is unsystematic.The differences are propagated through to EF, leading to very low correlation although with a small bias.The differences in the turbulent fluxes cannot be caused mainly by differences in the parametrization of the available energy since in that case the correlation reaches 0.97.This was expected since the two models use the same incoming solar radiation forcing and the same albedo maps so the majority of the 35 W m −2 RMSD (55 % of MSD) is systematic and caused by the differences in the net longwave radiation estimation due to different LSTs, with DTD using MODIS LST and MIKE SHE the modelled LST from the SW-ET module, and by the ground heat flux calculations.

MIKE SHE vs. TSEB-2ART
The comparison of fluxes produced with MIKE SHE and TSEB-2ART follows a similar pattern as in the previous section, with relatively low correlation and significant RMSD but with much lower bias (maximum magnitude of 8 W m −2 in the case of H ). The other statistics are similar to the ones from MIKE SHE-DTD comparison, with RMSD of 84 W m −2 and r of 0.38 for H and 85 W m −2 and 0.58 for LE.The correlation of EF is slightly higher than in the case   of DTD, with a value of 0.25, and the characterization of AE is consistent between the two models, with a correlation of 0.95 and a bias of −7 W m −2 , with only 8 % of MSD being attributed to systematic errors.

TSEB-2ART vs. DTD
The correlation between the turbulent fluxes modelled with TSEB-2ART and DTD is the highest of any model pairs, with correlation coefficient of 0.42 for H and 0.70 for LE, even though the fluxes were obtained at different times of the day.The time offset is evident in the bias of AE, with the value of AE during the later Aqua overpass time being on average 55 W m −2 higher than the value of AE during Envisat overpass time.The biases are also present in the other flux estimates, particularly of LE, with a value of −77 W m −2 .However, even though the biases are much higher than in any other pair, the RMSD between TSEB-2ART and DTD estimated turbulent fluxes is comparable to RMSD of those fluxes between the other pairs.As can be seen from the split of the difference into systematic and unsystematic components, a large component of the MSD between the fluxes is systematic with RMPD u of LE reaching the lowest values of all the model pairs and RMPD u of H being very close to the minimum.The correlation and RMSD of EF is also the best of all the model pairs, with values of 0.33 and 0.18 respectively.
Table 2. Statistical comparison between MIKE SHE, DTD and TSEB-2ART models for sensible and latent heat fluxes (H and LE), available energy (AE) and evaporative fraction (EF).Statistics used: correlation coefficient (r), root mean square difference (RMSD), systematic and unsystematic root mean product differences (RMPD s and RMPD u respectively), the percentage of mean square difference (MSD) attributed to systematic and unsystematic mean product differences (MPD) (MPD s / MSD and MPD u / MSD respectively) and bias.The statistics for H , LE and AE are in W m −2 , with the exception of MPD s / MSD and MPD u / MSD, which are percentages.The statistics for EF are unitless, with the exception of MPD s / MSD and MPD u / MSD, which are percentages.

Temporal patterns
The results of comparing DTD and TSEB-2ART catchmentwide evapotranspiration estimates against MIKE SHE are presented in Fig. 5, with the statistics summarized in Table 3.The correlation between the latent heat fluxes modelled with DTD or TSEB-2ART and MIKE SHE are quite similar, with correlation coefficients having a value of 0.79 in the case of comparing MIKE SHE and DTD and 0.83 in the case of MIKE SHE and TSEB-2ART.The biases between the modelled fluxes are quite small, with the largest one present when looking at LE between MIKE SHE and DTD and having a value of −30 W m −2 , which represents just 13 % of the mean value of LE modelled by MIKE SHE.The RMSD values between DTD and MIKE SHE and between TSEB-2ART and MIKE SHE are 68 and 45 W m −2 respectively.This rep-Table 3. Statistical comparison of catchment-wide latent heat fluxes estimated by the model pairs (MIKE SHE-DTD and MIKE SHE-TSEB-2ART) for predominantly cloud-free days over the period of 8 years.Statistics used: correlation coefficient (r), root mean square difference (RMSD), relative RMSD (%RMSD), bias and relative bias (%bias).RMSD and bias are in W m −2 while %RMSD and %bias are calculated as the statistic divided by the mean of the MIKE SHE LE estimates and are percentages.Histogram of the pixel-wise differences between evaporative fraction (EF) estimated by MIKE SHE at the time of Aqua overpass and Envisat overpass.The differences between the two sets were evaluated using the two-sample t test and are found to be statistically significant with a p value smaller than 0.001.resents 28 % of the mean value of MIKE SHE LE in the case of DTD and 21 % in the case of TSEB-2ART.

Spatial patterns
Even though DTD and TSEB-2ART estimate fluxes at different times during the day, the correlation between H and LE estimated by those two models is as strong (in the case of H ) or stronger than between either of the models and MIKE SHE.In addition, the value of RMPD u between LE estimated with those two models is lower than for the other comparisons, even though the RMSD between LE modelled with TSEB-2ART and MIKE SHE is lower.This indicates that the spatial patterns produced by the remotely sensed models have a stronger agreement with each other than with the patterns produced by the hydrological model.It can be presumed that if the DTD and TSEB-2ART estimated the fluxes at the same time, the correlation would be even higher and the differences even smaller.
When the seasonal signal of the available energy is removed by replacing the turbulent fluxes by EF, the spatial patterns produced by the remote sensing models are still more strongly correlated than when either of them is compared to the hydrological model.The correlation coefficient of TSEB-2ART and DTD EF is 0.33 compared to the second highest value of 0.25 between MIKE SHE and TSEB-2ART EF.However, it should once again be kept in mind that the remote sensing models estimate the fluxes at different times of the day.Usually it is assumed that during clear sky days the EF remains constant throughout the daytime and especially around noon (Peng et al., 2013).However, by comparing the differences of EF modelled by MIKE SHE at the Aqua and Envisat overpass times (Fig. 6), it can be seen that EF differs between the overpasses.This could be due to the fact that for the majority of the days used in this study there was some cloud cover over the Skjern river catchment (the threshold of inclusion in the study was 10 % of cloud free pixels during the night and day Aqua overpass) meaning that it is highly probable that clouds have passed over the study pixels between the two satellite overpasses, breaking the assumption of self-preservation of EF (Crago, 1996).Therefore it can be assumed that if the EF from TSEB-2ART and DTD were estimated at the same time the correlation would be higher still.
Figure 7 and Table 4 present the results of comparing just the vegetation transpiration (LE C ) produced by the three models.The correlation coefficient of the modelled transpiration is higher than for bulk LE (transpiration and soil evaporation combined) for all the model pairs, with the correlation between TSEB-2ART and DTD still remaining the highest.However, the RMSD of transpiration is higher than that of bulk LE when comparing TSEB-2ART with both DTD and MIKE SHE.This is also the case with bias, which additionally switches sign.The differences between transpiration produced by MIKE SHE and DTD are smaller than between the bulk ET.Those statistics could indicate that either the retrieval of canopy temperatures is more accurate than the retrieval of soil temperatures (which appears to be corroborated by the discussion in Sect.7.1.2) or that the canopy component of two-source models is more physically sound.However, there are very few studies in which the two components of bulk LE are validated separately due to the difficulty of obtaining such measurements in situ, especially for comparison with models driven with satellite based observations.Therefore, the accuracy of those components has not yet been  established and focused studies are required in order to come to more definitive conclusions.
When considering the causes of the remaining differences in the modelled fluxes, some factors can be directly removed.The three models used many of the same spatial data sets as input: LAI maps, land cover map and meteorological forcing data (air temperature, incoming solar radiation, humidity and wind speed).In addition, DTD and MIKE SHE used the same albedo maps and MIKE SHE was calibrated using the same Aqua MODIS LST observations as used by DTD.The mismatch caused by image misregistration was reduced by applying the median filter over the output maps, although on cloudy days there are many isolated pixels, making the filtering less efficient.The available energy is very highly correlated in all three comparisons, with small RMSD and bias in the case of the two comparisons for which fluxes are estimated at the same hour, so this is also not a major contributor to the differences between the turbulent fluxes.
The remaining major causes of the observed differences in the model outputs could be (1) parametrization used in different land cover classes; (2) the LST input maps estimated by different sensors, in the case of DTD (MODIS) and TSEB-2ART (AATSR), or modelled, in the case of MIKE SHE; and (3) the differences in the modelling approach between the three models even though all of them apply the two-source modelling scheme.

Differences due to parameterization of land cover classes
Figures 8-10 show box plots of the turbulent fluxes, AE and EF split according to the land cover class.The graphs indicate that the statistical distribution of fluxes in the different land cover classes is quite similar among the models, albeit with a large number of outlier points in sensible heat estimations of all models and latent heat estimations of DTD.When looking at the median and 25th and 75th percentile values of evapotranspiration, the differences do not appear as significant as could be expected from the results shown in Table 2. Considering the DTD model, the ET estimated with the remote sensing model is generally larger than the MIKE SHE estimated ET across all land cover classes.The same can be observed in the case of AE but not in the case of H .This is largely due to the DTD soil heat flux (G) being affected by the LAI multiplicative factor (Table 1).The calculation of G in DTD is dependent on a fraction of the net radiation reaching the soil, which is fundamentally estimated based on the Beer-Lambert law.Therefore, an increase in the value of LAI input into the model leads to a decrease in the magnitude of G, which in turn means that AE is higher and that the magnitude of LE also increases.
In the case of TSEB-2ART, the range between the 25th and 75th percentile values of ET is smaller in croplands and grasslands when compared to MIKE SHE ET, while the median value of conifer forest ET is a bit larger.The range Maps of correlation, RMSD and bias between LE modelled with different model pairs (Fig. 11) show clear spatial patterns, at least partly influenced by land cover (see Fig. 1).There is clear lack of statistically significant correlation between MIKE SHE and the remote sensing models over the forested areas, which is not present when the two remote sensing models are compared.Additionally RMSD is generally higher in forests for all model pairs, but particularly when comparing MIKE SHE and DTD.The bias between LE modelled with TSEB-2ART and DTD is negative throughout the catchment (due to the later overpass of Aqua as compared to Envisat), with the exception of most of the forest areas, where it is slightly positive.Apart from the land cover influenced patterns there is a larger-scale pattern when comparing the outputs of the hydrological and remote sensing models: the correlation is lower, RMSD higher and bias negative in the northern and eastern parts of the catchment (with the exception of the very north-western tip), while the opposite is true in the south-western part.This is due to the first area being classified as having predominantly clayey soil and sec-ond as having predominantly coarse sandy soil (see Fig. 3 in Greve et al., 2007).This pattern is not visible when TSEB-2ART and DTD are compared which illustrates the sensitivity of the hydrological model to the proper characterization of soil hydraulic properties (which is difficult to do over large areas) and the advantage of the remote sensing models in not requiring this parameter.

Differences due to estimates of LST and its component temperatures
Table 6 shows the statistical comparison between the LST, which is used as input of the remote sensing models and is one of the outputs of the hydrological model, for all the pixels where flux comparison was also performed.The graphical representation is shown in the top left panels of Figs.12-14.
When comparing LST, it must be noted that it is dependent on the viewing geometry, such as VZA, which is quite different between the two satellites and also between the satellites and the hydrological model, for which the sensor is assumed to be directly at nadir.The correlation between the LST from the different model pairs is quite high, with r around 0.9 when comparing the remotely sensed LST from MODIS and AATSR with the MIKE SHE estimates, but reaching 0.97 when the two remotely sensed LSTs are compared.RMSD of LST is quite high, at 4.4 and 5.2 • C in the case of comparing MODIS and AATSR with MIKE SHE and around 3 • C when comparing MODIS with AATSR, although in this case the time difference between the observations should be kept in mind.
Although the high spatial correlation of LST would indicate that the different sources of LST are not a major component in the discrepancies between the modelled fluxes it must be noted that the fluxes are strongly dependent on the LST-T a gradient and that this dependency is non-linear due to the turbulent transport of heat between the surface and the overlying air (Obukhov, 1971).Due to this non-linearity, the systematic differences in LST between models can potentially lead to larger unsystematic differences in flux estimations.An additional complication in the current study is the fact that the DTD uses the relative temperature difference between night and day observations (Guzinski et al., 2013) and that TSEB-2ART is based on the differences of temperature between the nadir and forward LST observations (Nieto et al., 2013) for flux estimation.It is also interesting to note that although MIKE SHE was calibrated with MODIS Aqua LST observations in the Skjern River catchment, the two satellite based LSTs have a better agreement with each other than with MIKE SHE, despite the overpass times of the two satellites being different.This indicates that the use of LST observations from a satellite sensor, either as a forcing input for MIKE SHE model or for data assimilation (in addition to it being used for calibration), could potentially improve the spatial performance of the hydrological model.
Table 5. Statistical comparison between MIKE SHE, DTD and TSEB-2ART models for sensible and latent heat fluxes (H and LE), available energy (AE) and evaporative fraction (EF) for flux estimates when LAI < 4. Statistics used: correlation coefficient (r), root mean square difference (RMSD), systematic and unsystematic root mean product differences (RMPD s and RMPD u respectively), the percentage of mean square difference (MSD) attributed to systematic and unsystematic mean product differences (MPD) (MPD s / MSD and MPD u / MSD respectively) and bias.The statistics for H , LE and AE are in W m −2 , with the exception of MPD s / MSD and MPD u / MSD, which are percentages.The statistics for EF are unitless, with the exception of MPD s / MSD and MPD u / MSD, which are percentages.2 and density scatter plots in Figs.2-4).Correlation maps used the same set of values but only the pixels where correlation was significant at 5 % level are shown.
In addition, the canopy, soil and in-canopy air temperatures (T C , T S and T AC respectively) estimated by the different models are also compared in Table 6.The estimation of those temperatures could be considered to be an intermediate step during the estimation of the fluxes in the models (Norman et al., 1995;Overgaard and Rosbjerg, 2005), thereby allowing a deeper understanding of the internal model behaviour.The three models apply different methods for es-Table 6. Statistical comparison between MIKE SHE, DTD and TSEB-2ART models for the land surface temperatures (LST), canopy temperatures (T C ), soil temperatures (T S ) and in-canopy air temperatures (T AC ).Statistics used: correlation coefficient (r), root mean square difference (RMSD), systematic and unsystematic root mean product differences (RMPD s and RMPD u respectively), the percentage of mean square difference (MSD) attributed to systematic and unsystematic mean product differences (MPD) (MPD s / MSD and MPD u / MSD respectively) and bias.LST comes from Aqua MODIS observations in the case of DTD and nadir view Envisat AATSR observations in the case of TSEB-2ART, and is modelled in the case of MIKE SHE.The other temperatures are estimated by all models.The statistics are in    timating those temperatures.In the case of DTD, temperatures are not used directly during the flux estimation (the time differential temperature observations are used), but are derived as a final step when all the flux and resistance values are already established using rearranged Eqs.(A27), (A29) and (A33) from Guzinski et al. (2014).TSEB-2ART uses the viewing geometry of the two observation angles within a radiative transfer model framework to estimate T C and T S , which, together with the resistances to heat transport, are then used to calculate T AC and the fluxes using the TSEB formulations (Nieto et al., 2013;Kustas and Norman, 1997;Norman et al., 1995).In MIKE SHE, the temperatures, together with humidity, are derived by solving a set of 10 linear equations involving the resistances and AE as parameters, after which the turbulent fluxes are derived (Overgaard and Rosbjerg, 2005).Despite those three different methods the correlation between the temperatures is quite high (Table 6) which is surprising considering the much lower correlation between the modelled turbulent fluxes.Again, this is probably caused by the non-linearity between the gradient of temperatures and the heat flux due to turbulence.It could also be due to the heating effect that the interaction between soil temperature and heat fluxes produces for the temperature of the air at the canopy interface, when the resistances are configured in series.The highest correlation, above 0.93 for all the pairs, is for T AC , and the lowest, ranging from 0.63 to 0.72, is for T S .Overall the two remote sensing models have most similar spatial patterns of T C and T AC , and MIKE SHE and TSEB-2ART have most similar spatial pattern of T S .
Furthermore, since the TSEB-2ART model relies on the differences observed between the nadir and forward LST of AATSR in order to derive T C and T S , it is sensitive to errors in the estimation of LST at the two viewing angles.Those errors might be significant if, for example, atmospheric water vapour is not properly characterized and accounted for, due to the different optical path lengths of the forward (VZA = 55 • ) and nadir observations.Since the atmospheric path length of the forward view is longer, the a priori uncertainty in the estimation of forward LST is higher than in the case of nadir LST.In Figs. 13 and 14 it can be seen that this occurs in a number of cases, mostly leading to severe underestimation of T C and T AC and overestimation of T S when compared to the other models.This large bias in TSEB-2ART estimated T C is also present in the statistical comparison in Table 6.It also appears that MIKE SHE produces higher values at large magnitudes of T C (Figs. 12 and 13), although those difference are not severe.This relative overestimation of T C is reflected in MIKE SHE LST scatter plots, which indicates that it happens at high LAI values when vegetation cover fraction is close to 1.

Differences due to the modelling approach
Although there are differences in the estimated temperatures that could lead to larger unsystematic differences in the fluxes estimates, it is likely that there are also other factors contributing to the inconsistencies between fluxes.One of the factors could be the methodology employed by the different models for splitting of the available energy into the sensible and latent heat fluxes and in particular the way they estimate the resistances to heat and moisture transport.The two remote sensing models ensure the land surface energy balance by calculating latent heat flux as the residual of the other fluxes, i.e.LE = AE −H (Norman et al., 1995).The hydrological model, on the other hand, derives the latent and sensible heat fluxes concurrently (Overgaard and Rosbjerg, 2005).In addition the resistance network for LE in MIKE SHE has two extra resistance components compared to the resistance network for H : the resistance to soil evaporation and the stomata resistance to transpiration.Both of them depend on the soil moisture as modelled by the hydrological component of MIKE SHE.If the spatial patterns of the soil moisture estimated with MIKE SHE do not correspond closely to the spatial patterns seen by the satellites this could lead to the different spatial patterns of the estimated fluxes.
Another possible factor for the observed differences between the estimated fluxes could be the actual formulations  (1995), the hydrological model uses equations suggested by Choudhury and Monteith (1988).To evaluate whether those different formulations could be the reason for the fluxes estimated with the remote sensing models being more similar to each other than to the fluxes estimated with the hydrological model, it was decided to run the remote sensing models with the resistance equations taken from Choudhury and Monteith (1988).
The results are presented in Table 7.The correlation between the turbulent fluxes produced by all model pairs has decreased when compared to results in Table 2.This is surprising, as it could be expected that using the same resistance formulations would increase the correlation between the remote sensing models and the hydrological model.There was also a small increase in RMSD (with the exception of the MIKE SHE-TSEB-2ART H comparison).In addition, the number of valid pixels has been reduced from over 95 000 to over 83 000.This could indicate that the Norman et al. (1995) resistance formulations produce more realistic values than the Choudhury and Monteith (1988) formulations, which would point to the possibility of updating the equations used in the SW-ET module of MIKE SHE.Even when using the Choudhury and Monteith (1988) resistance formulations, the LE modelled with DTD and TSEB-2ART has the highest correlation.However, in the case of H , MIKE SHE and DTD produced the most correlated flux estimates, while the correlation of EF was very similar between the MIKE SHE-TSEB-2ART and DTD-TSEB-2ART model pairs.Finally, Fig. 15 illustrates the effect of modifying the R S formulation in the DTD, as proposed in Eq. ( 4).The R S values estimated with DTD and TSEB-2ART are compared for all the pixels where the flux comparison was performed.In this case, the TSEB-2ART derived R S can be thought of as the "true" value, since it uses the original R S equation (Eq.3), with T S and T C derived directly through the inversion of the RTM.The overestimation of R S by DTD, visible as a "bubble" in the left panel for DTD R S values between 150 and 200 s m −1 , is due to misparametrization of the differences between the canopy and soil temperatures in the original DTD formulation, and is mostly present in the coniferous forest.In the right panel this overestimation is less pronounced, indicating that the new R S equation is better at parametrizing this temperature difference.The correlation parameter between the two resistances has increased from 0.62 in the case of the old formulation to 0.68 in the case of the new one, while the RMSD has decreased by around 15 %, from 36 to 31 s m −1 .

Temporal patterns
Both remote sensing models are reasonably accurate in matching MIKE SHE catchment-wide estimates of evapotranspiration, with the seasonal curve clearly visible for both models (Fig. 5) and reflecting the MIKE SHE seasonality well.The DTD model tends to produce larger latent heat fluxes before and after the growing season.This is probably due to the larger magnitude of AE estimated by this model (see Table 2), which is mostly assigned to LE, as it is calculated as a residual of the surface energy balance.On the other hand TSEB-2ART matches MIKE SHE fluxes quite  (Norman et al., 2000) and in the right panel the new formulation is used (Eq.4).Red colour indicates higher density of points, blue colour lower density.
well during that period.During the growing season there is the largest mismatch between LE modelled by the remote sensing models and the hydrological model.This is partly due to the fluxes having the largest magnitude during this time, but also due to LAI having a large value which blocks the temperature signal from the soil surface (see Sect. 7.1.1)Figure 5 also highlights another weakness of the remote sensing models, namely that they only produce results on clear sky days.The great majority of latent heat fluxes estimated by the remote sensing models, and by the hydrological model on the same dates as the remote sensing models, lie above the line representing an average, all-weather ET for a particular DOY for all the years under study.This is also true when considering an 8-year averaged potential ET.The reason is because in the Skjern River environment the evapotranspiration is mainly driven by availability of energy (and not of water), and therefore on clear sky days the evapotranspiration will be higher than average.This has to be taken into account when extrapolating temporal patterns of evapotranspiration derived purely by the remote sensing input based models.
There are a couple of cases where the clear sky evapotranspiration modelled by MIKE SHE is much below the average line, even though the remote sensing models estimate much higher latent heat fluxes on those days.This most probably corresponds to days with soil drier than normal and could indicate: (1) a problem of the hydrological model in estimating the moisture of the upper layer of the soil or of the root zone during dry conditions, or (2) be related to uncertainties in the interpolated rainfall data due to omission by the rain gauges of local convective rainfall during the summer period.

Conclusions and outlook
Two remote sensing models and one hydrological model were run over an area covering a river catchment in west-ern Denmark and the spatial and temporal patterns of the modelled evapotranspiration were compared.The spatial patterns of latent and sensible heat fluxes as well as EF produced by the remote sensing models were more strongly correlated with each other than the patterns produced by either of the remote sensing models compared to the hydrological model.This was the case even though the two remote sensing models use both different data (MODIS and AATSR LST) and different approaches to estimating the fluxes and, additionally, those estimates were produced at different time of the day, due to different overpass times of satellites.This indicates that the remote sensing models might contain some additional information that is not currently present in the hydrological model.At the same time, the temporal patterns of evapotranspiration produced by both of the remote sensing models and the hydrological model were strongly correlated, with relatively small RMSD and small bias.Those observations would appear to support the hypothesis that the remote sensing models would better represent the spatial patterns of evapotranspiration present throughout the catchment, while the hydrological model would better represent the catchment-wide evapotranspiration.
This points towards a possibility of using the remotely sensed evapotranspiration to improve the spatial accuracy of distributed, physically based hydrological models.This could be achieved either through using the estimated latent heat flux as one of the calibrating parameters or through data assimilation during the model run.Certain attempts at incorporating spatial distributed data derived through remote sensing into hydrological models, either through data assimilation or calibration, have already been made but they were mostly focused on soil moisture (e.g.Draper et al., 2011;Corato et al., 2013), LST (Stisen et al., 2011a;Ridler et al., 2012) or LAI (Boegh et al., 2004).Pipunic et al. (2008) have looked at assimilating simulated H and LE estimates into a land surface model, however this was done with a one-dimensional single column model, i.e. without considering spatial patterns.Others have assimilated ET maps into distributed hydrological models but the impact of that assimilation was inconclusive (Pan et al., 2008;Schuurmans et al., 2011).Therefore, further studies are needed to establish whether ET, and in particular its spatial distribution, would bring any additional information beyond what is provided by soil moisture or LST estimates alone.In the case of MIKE SHE it might also be useful to use the T C , T S and T AC estimates from the remote sensing models to constraint the number of unknowns that need to be addressed in the model.Methodologies for validating the accuracy of spatial patterns at the catchment scale, while at the same time remaining independent of the model used, would also have to be investigated.

Figure 1 .
Figure 1.Land use map of the study area: the Skjern river catchment in the west of Denmark's Jutland peninsula.Model input meteorological data were interpolated from the measurements taken by the stations shown on the map.

Figure 2 .
Figure 2. Density scatter plot of over 95 000 points comparing the sensible heat flux (top left), latent heat flux (top right), available energy (bottom left) and evaporative fraction (bottom right) modelled by MIKE SHE and DTD.Red colour indicates higher density of points, blue colour lower density.

Figure 3 .
Figure 3. Density scatter plot of over 95 000 points comparing the sensible heat flux (top left), latent heat flux (top right), available energy (bottom left) and evaporative fraction (bottom right) modelled by MIKE SHE and TSEB-2ART.Red colour indicates higher density of points, blue colour lower density.

Figure 4 .
Figure 4. Density scatter plot of over 95 000 points comparing the sensible heat flux (top left), latent heat flux (top right), available energy (bottom left) and evaporative fraction (bottom right) modelled by TSEB-2ART and DTD.Red colour indicates higher density of points, blue colour lower density.

Figure 5 .
Figure 5. Average catchment-wide latent heat fluxes on the days when at least 70 % of non-water and non-urban pixels were modelled by either DTD (left) or TSEB-2ART (right).In the main graph the blue circles represent catchment fluxes modelled by MIKE SHE and the red crosses represent the catchment fluxes modelled by the remote sensing models on the same year and day of year (DOY) and at the same time of day.The figure contains dates from the 8 years under investigation and the blue solid line shows an 8-year averaged whole catchment ET for a particular DOY as modelled by MIKE SHE around the time of Aqua (left) or Envisat (right) overpass.The blue broken line shows potential ET for the same data set estimated using the Priestley-Taylor approach and MIKE SHE AE.The inset image contains a scatterplot of the MIKE SHE and remote sensing fluxes with black indicating fluxes from January to April, green from May to August and brown from September to December.
Figure6.Histogram of the pixel-wise differences between evaporative fraction (EF) estimated by MIKE SHE at the time of Aqua overpass and Envisat overpass.The differences between the two sets were evaluated using the two-sample t test and are found to be statistically significant with a p value smaller than 0.001.

Figure 7 .
Figure 7. Density scatter plot comparing the vegetation latent heat flux modelled by MIKE SHE and DTD (left), MIKE SHE and TSEB-2ART (centre) and TSEB-2ART and DTD (right).Red colour indicates higher density of points, blue colour lower density.

Figure 10 .
Figure 10.Box plots of sensible heat flux (top left), latent heat flux (top right), net radiation (bottom left) and evaporative fraction (bottom right) modelled by TSEB-2ART (leftward box in each category) and DTD (rightward box in each category) and split by land cover class.The red horizontal line indicates the median value with the upper and lower box edges indicating the 75th and 25th percentiles respectively.The whiskers extend to the furthest point within 1.5 times the inter-box range above or bellow the box edges with points beyond that categorized as outliers and marked individually as a red crosses.

Figure 11 .
Figure 11.Maps of spatial patterns of correlation (r -first column), RMSD (second column) and bias (third column) calculated with LE output of MIKE SHE-DTD (first row), MIKE SHE-TSEB-2ART (second row) and TSEB-2ART-DTD (third row) in the whole Skjern river catchment.For each pixel in the RMSD and bias maps a mean of the statistics' values was taken from all the points satisfying the conditions stated in Sect. 5 (i.e. the same set of values was used as for producing statistics in Table2and density scatter plots in Figs.2-4).Correlation maps used the same set of values but only the pixels where correlation was significant at 5 % level are shown.

Figure 12 .
Figure 12.Density scatter plot of over 95 000 points comparing land surface temperature (top left), canopy temperature (top right), soil temperature (bottom left) and in-canopy air temperature (bottom right).Land surface temperature on the x axis was modelled by MIKE SHE and on the y axis came from daytime observations from the MYD11A1 MODIS product.The other temperatures were modelled by both MIKE SHE and DTD.Red colour indicates higher density of points, blue colour lower density.

Figure 13 .
Figure 13.Density scatter plot of over 95 000 points comparing land surface temperature (top left), canopy temperature (top right), soil temperature (bottom left) and in-canopy air temperature (bottom right).Land surface temperature on the x axis was modelled by MIKE SHE and on the y axis came from nadir observations by AATSR sensor on the Envisat satellite.The other temperatures were modelled by both MIKE SHE and TSEB-2ART.Red colour indicates higher density of points, blue colour lower density.

Figure 14 .
Figure 14.Density scatter plot of over 95 000 points comparing land surface temperature (top left), canopy temperature (top right), soil temperature (bottom left) and in-canopy air temperature (bottom right).Land surface temperature on the x axis came from nadir observations by AATSR sensor on the Envisat satellite and on the y axis came from daytime observations from the MYD11A1 MODIS product.The other temperatures were modelled by both TSEB-2ART and DTD.Red colour indicates higher density of points, blue colour lower density.

Figure 15 .
Figure 15.Density scatter plot of over 95 000 points comparing resistance of heat transfer from the soil surface (R S ) modelled by the TSEB-2ART and DTD.In the left panel the original DTD formulation is used(Norman et al., 2000) and in the right panel the new formulation is used (Eq.4).Red colour indicates higher density of points, blue colour lower density.

Table 1 .
Land cover dependent parameters for the three models.The equations referred to in the table are (Eq.a) 0.15• LAI and (Eq.b) 0.12• LAI +0.07, where LAI is the leaf area index before multiplication by the LAI correction factor and has a minimum value of 0.5.

2) Hydrol. Earth Syst. Sci., 19, 2017-2036, 2015 www.hydrol-earth-syst-sci.net/19/2017/2015/ where
R n,S is the net radiation of the soil, H S is the sensible heat flux of the soil and G is the ground heat flux.

Guzinski et al.: Inter-comparison of energy balance and hydrological models in a river catchment 2023 information
or insight would be gained.On the contrary, the self-preservation of EF might not always hold over the whole study area due to frequently cloudy conditions, bringing additional complications and errors when extrapolating to daily values.

Table 4 .
Statistical comparison between MIKE SHE, DTD and TSEB-2ART models for latent heat flux of the canopy (LE C ). Statistics used: correlation coefficient (r), root mean square difference (RMSD), systematic and unsystematic root mean product differences (RMPD s and RMPD u respectively), the percentage of mean square difference (MSD) attributed to systematic and unsystematic mean product differ- ences (MPD) (MPD s / MSD and MPD u / MSD respectively) and bias.The statistics are in W m −2 , with the exception of MPD s / MSD and MPD u / MSD, which are percentages.
• C, with the exception of MPD s / MSD and MPD u / MSD, which are percentages.

Guzinski et al.: Inter-comparison of energy balance and hydrological models in a river catchment 2033Table 7 .
Choudhury and Monteith (1988)n MIKE SHE, DTD and TSEB-2ART models for sensible and latent heat fluxes (H and LE), available energy (AE) and evaporative fraction (EF) for model runs with resistance equations taken fromChoudhury and Monteith (1988).Statistics used: correlation coefficient (r), root mean square difference (RMSD), systematic and unsystematic root mean product differences (RMPD s and RMPD u respectively), the percentage of mean square difference (MSD) attributed to systematic and unsystematic mean product differences (MPD) (MPD s / MSD and MPD u / MSD respectively) and bias.The statistics for H , LE and AE are in W m −2 , with the exception of MPD s / MSD and MPD u / MSD, which are percentages.The statistics for EF are unitless, with the exception of MPD s / MSD and MPD u / MSD, which are percentages.
used for resistances of heat transfer between the soil, vegetation, in-canopy air and above-canopy air.While the two remote sensing models use equations based on Norman et al.