Articles | Volume 28, issue 13
https://doi.org/10.5194/hess-28-2991-2024
https://doi.org/10.5194/hess-28-2991-2024
Technical note
 | 
09 Jul 2024
Technical note |  | 09 Jul 2024

Technical note: Surface fields for global environmental modelling

Margarita Choulga, Francesca Moschini, Cinzia Mazzetti, Stefania Grimaldi, Juliana Disperati, Hylke Beck, Peter Salamon, and Christel Prudhomme
Abstract

Climate change has resulted in more frequent occurrences of extreme events, such as flooding and heavy snowfall, which can have a significant impact on densely populated or industrialised areas. Numerical models are used to simulate and predict these extreme events, enabling informed decision-making and planning to minimise human casualties and to protect costly infrastructure. LISFLOOD is an integrated hydrological model underpinning the European Flood Awareness System and Global Flood Awareness System (EFAS and GloFAS, respectively), developed by the Copernicus Emergency Management Service (CEMS). The CEMS_SurfaceFields_2022 dataset is a new set of high-resolution surface fields at 1 and 3 arcmin resolution (approximately 2 and 6 km at the Equator, respectively) based on a wide variety of high-resolution and up-to-date data sources. The 1 arcmin fields cover Europe, while the surface fields at 3 arcmin cover the global land surface (excluding Antarctica). The dataset encompasses (i) catchment morphology and river networks, (ii) land use, (iii) vegetation cover type and properties, (iv) soil properties, (v) lake information, and (vi) water demand. This paper details the complete workflow used to generate the CEMS_SurfaceFields_2022 fields, including the data sources and methodology. Whilst created together with upgrades to the open source LISFLOOD code, the CEMS_SurfaceFields_2022 fields can be used independently for a wide range of applications, including as input to hydrological, Earth system, or environmental models or for carrying out general analyses across spatial scales, ranging from global and regional levels to local levels (especially useful for regions outside Europe), expected to improve the accuracy, detail and realism of applications.

1 Introduction

Current numerical Earth system models are highly complex. Thanks to the availability of high-performance computing; cloud computing; and a wide range of high-resolution environmental data derived from the use of ground, unconventional, and satellite measurement sensors, numerical global models are even able to reach kilometre-scale horizontal resolution. But an increase in spatial resolution also means that the Earth system and environmental models have to represent more surface and atmospheric processes and their interactions, which can become challenging, e.g. in complex orographic areas. Model accuracy heavily depends on the quality of the input surface fields (i.e. how realistic and up-to-date they are), and it is essential to minimise errors in surface fields. New high-resolution (i.e. 10–100 m) surface datasets based on daily satellite observations are now frequently released and continuously supported by, for example, the Copernicus programme (e.g. Global Land Cover: Buchhorn et al., 2021; GHS-BUILT-S: Pesaresi and Politis, 2022; Schiavina et al., 2022), which helps with achieving the goal of minimising surface field errors. It was shown, e.g. in Kimpson et al. (2023), that the use of accurate and up-to-date underlying information to generate the model's input surface fields can substantially reduce skin temperature errors even at 30 km horizontal resolution (Kimpson et al., 2023).

Following the digital revolution of cloud archiving and computing, where data, software and information technology (IT) infrastructure can be accessed by anyone from anywhere, the Earth systems and environmental modelling community has also moved from code developed by a single organisation and few contributors to so-called “community models”. Community model reference code is open for free use and/or development according to sharing principles. Such models include the Joint UK Land Environment Simulator (JULES)1 (Best et al., 2011; Clark et al., 2011; Marthews et al., 2022), OpenIFS2 (Sparrow et al., 2021; Carver, 2022; Huijnen et al., 2022; Köhler et al., 2023), the Community Land Model (CLM)3 (Lawrence et al., 2019), and LISFLOOD-OS4 (Van Der Knijff and De Roo, 2008).

To promote the seamless development of science and facilitate research community efforts in working with the same code and input data, providing feedback and improving the code and the data itself, powerful web-based platforms can be used. One of them is the Google Earth Engine (GEE; Gorelick et al., 2017), a free-of-charge platform that provides easy, web-based access to an extensive catalogue of satellite imagery and other geospatial data in an analysis-ready format. The data catalogue is embedded into the Google computing platform that lets you easily implement all personal workflows, which facilitates global-scale analysis and visualisation (GEE: FAQ, 2023). GEE was chosen for the generation of a new vast surface field set due to its high-resolution data catalogue and powerful computation capabilities.

This paper presents the methodology used to prepare the CEMS_SurfaceFields_2022 dataset containing all surface fields necessary to run the LISFLOOD-OS model at resolutions of  2 km at the Equator or 1 arcmin (over Europe; 1 arcmin resolution at mid-latitude of the domain (47.50° N) is  1.25 km) and of  6 km at the Equator or 3 arcmin (globally). CEMS_SurfaceFields_2022 was used in the set-up of the early warning systems of the Copernicus Emergency Management Service of the European Union for the European5 and global6 domains operational in December 2023 (EFASv5 and GloFASv4). Details on raw data collection, scientific protocol and technical methods aim to allow for the adequate understanding and interpretation of the surface field datasets. For any interested user, it is possible to generate their own datasets by replicating or adapting the workflow to different fields, geographical domains, spatial resolutions, or content as relevant for downstream applications. The paper is structured as follows: Sect. 2 provides an overview of the surface fields, explains the criteria to select reference data (where and how they were processed) and outlines the general methodology to produce the surface fields; Sects. 3 to 8 detail the reference data and specific methodology applied to each surface field category, including examples of application; Sect. 9 provides all the relevant information for data access; Sect. 10 discusses the challenges of creating a consistent high-resolution continental- and global-scale set of surface fields and the opportunities disclosed by its availability.

2 Surface fields for distributed environmental modelling

2.1 General information

Environmental models, especially land surface and hydrological models, simulate how water moves across the canopy, surface, subsurface, ground and eventually river channels using mechanistic equations that describe the physics of these processes. Each model represents processes with more or less complexity, depending on the model purpose and expected output (Rosbjerg and Madsen, 2006). With most represented terrestrial processes depending on the landscape, information describing the spatial variation in the geophysical and vegetation characteristics is needed. Such characteristics include morphological features (e.g. channel geometry, orography or slope); soil hydraulic property, land, and vegetation features (e.g. ecosystem cover type, leaf area index (LAI), evaporation rates, crop type, and planting and harvesting dates); and, if relevant, human intervention information such as population density or type of water usage.

LISFLOOD is a semi-distributed, physically based hydrological model which has been designed for the modelling of rainfall–runoff processes in large and transnational catchments (Bates and De Roo, 2000; De Roo et al., 2000, 2001; Van Der Knijff and De Roo, 2008; Van Der Knijff et al., 2010; Burek et al., 2013). In its most prominent application, LISFLOOD is used by the Copernicus Emergency Management Services' EFAS and GloFAS to provide medium-range and seasonal riverine flow forecasts (Alfieri et al., 2020). LISFLOOD is also widely used for a variety of applications, including water resources assessment (drought forecast); analysis of the impacts of land use changes, river regulation measures, and water management plans; and climate change analysis (e.g. Vanham et al., 2021).

To facilitate user uptake and enable the seamless development of science, LISFLOOD has been released as open-source software in 2019, i.e. LISFLOOD-OS. The open-source suite includes the LISFLOOD hydrological model and a set of auxiliary tools for model set-up, calibration, and pre- and post-processing of the results. For instance, the evaporation pre-processor for the LISFLOOD model, i.e. LISVAP, can be used to compute evapotranspiration, which together with total precipitation and average temperature are the three meteorological variables strictly required as input to the hydrological model.

The modelling of runoff processes in different climates and socio-economic contexts then requires a set of raster fields (i.e. set of surface fields presented in this paper) to provide information of terrain morphology, surface water bodies, soil properties, land cover and land use features, and water demand. The total number of fields ranges between 66, when only the essential rainfall–runoff processes are modelled, and a total of 108 for a more comprehensive model set-up in which, for instance, lakes, reservoirs and water demand for anthropogenic use are included (available online: https://ec-jrc.github.io/lisflood-model/, last access: 18 June 2024).

The main model's field (i.e. in a technical sense for model operation/running) is a “mask” – a Boolean field that defines model boundaries, i.e. grid cells over which the model performs calculations and grid cells which are skipped (e.g. ocean grid cells). Whilst the surface fields described in this paper follow specific requirements of the LISFLOOD-OS model, it is a source of versatile information that can be used for any environmental modelling application, either directly or following a transformation, as relevant, as a full set or as a few consistent fields.

2.2 Reference data and methodology

To produce CEMS_SurfaceFields_2022, only data sources that are open source, freely available, updated as recently as possible and with recognised references for their quality were used (see Appendix A for all relevant reference data details). Note that whilst the majority of surface fields contain no time element, vegetation and water demand fields explicitly describe the annual cycle (vegetation, rice) or annual time evolution (water demand) and therefore have more stringent requirements regarding the data source. Global single-source datasets (e.g. Te Chow, 1959; Supit et al., 1994; Allen et al., 1998; Buchhorn et al., 2021) were favoured to regional and/or multiple data sources that needed to be combined in order to produce the required data unless subset information was of much better quality (e.g. Moiret-Guigand, 2021). CEMS_SurfaceFields_2022 surface fields are based on 25 different data sources and consist of 140 gridded fields grouped into the six following groups: (i) catchment morphology and river network, (ii) land use, (iii) vegetation cover type and properties, (iv) soil properties, (v) lake information, and (vi) water demand.

Considering the high resolution (i.e. hundreds of metres) and volume of data (i.e. GB) of most input datasets used to generate the surface fields, a high performing data manipulation platform was needed. GEE (Gorelick et al., 2017) was selected as it provides (embedded) a vast high-resolution data catalogue (e.g. the readily available MERIT DEM elevation dataset and the CGLS-LC100 and CLC2018 land cover datasets) and powerful computation capabilities. It also allows users to upload any raster or vector data (e.g. GeoTIFF or shapefiles) and to conduct each surface field tailored computations. All GEE scripts were written in JavaScript to produce GeoTIFF files, which were converted to the final file format (NetCDF) locally after transfer from the GEE platform.

To ensure a consistent representation of physical processes at all scales, surface fields should be as coherent as possible among each other – between variables and across scales. Coherency can be achieved by using, where possible, the same input datasets to derive different field types (e.g. unique forest information input to create all forest-related surface fields) and making sure spatial aggregation or disaggregation across scales results in expected values. Figure 1 shows a simplified scheme that relates input datasets (e.g. CGLS-LC100) with the resulting surface fields (e.g. surface cover fractions – forest, inland water and sealed surface fraction fields), also highlighting fields requiring intermediary and sequential steps (e.g. forest fraction is needed to create soil parameter fields over forested and non-forested areas).

For processes with horizontal dependency such as river routing, the relationship between grid cells (e.g. how the grid cells are connected) must be defined first so that all dependent fields can be generated on the same grid coordinates and spatial resolution and using consistent input data. For example, local drainage direction (LDD) defines how water moves across the model grid cells as a river drainage network (see Fig. 2) and strongly depends on elevation data (see Sect. 3 for more details). Because of the complex spatial dependency of a river drainage network, LDD must be created directly from elevation data at the required grid and resolution and cannot be resampled from a previous LDD field of a different grid and/or resolution. It is then used to define information on the river network, including upstream drainage area and gradient. Note that Fig. 1 is missing an arrow from MERIT DEM to LDD only because this step was mainly done by CaMa-Flood developers (see Sect. 3.2 for more details).

Four steps are involved in generating a particular surface field (see Table 1), with step 3 being the most complex and varied (see Fig. 2 for an example), and step 4 being necessary only for some model specifications (here as required by LISFLOOD; see Table 2).

All techniques applied (see Table 1) to generate CEMS_SurfaceFields_2022 are reproducible to different input data and/or for different output data specifications. Further details on specific manipulations associated with each field category are given in sections below as relevant. Each section has a table with exact data source used per surface field and a step-by-step description of transformations applied to the data to compute the final fields included in CEMS_SurfaceFields_2022 (full technical descriptions for all fields are explained in the LISFLOOD user guide, available online: https://ec-jrc.github.io/lisflood-code/4_Static-Maps-introduction/, last access: 18 June 2024). Although the specific requirements for the dataset were defined by LISFLOOD for the EFAS and GloFAS implementations, summarised in Table 2, they are consistent with requirements of any other environmental models. Regional examples of a sub-set of CEMS_SurfaceFields_2022 are provided to show the level of detail available at each resolution and field and to emphasise the consistency in all the fields, a critical requirement for environment modelling and analysis. Examples focus on three regions of the world: the Po River (Europe), the Amazon River (South America) and the Brahmaputra river (Asia), with additional examples provided in Appendix D.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f01

Figure 1Flow chart connecting input datasets and surface fields created. A dashed border denotes intermediate fields that are not part of the final dataset catalogue.

Download

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f02

Figure 2Examples of data manipulation for (left column, panel a) transformation of elevation data into LDD (done within CaMa-Flood) and (right column, panel b) upscaling with weighted average for one final grid cell of soil hydraulic property over forested area.

Download

Table 1The four steps of a particular surface field generation and associated data manipulations.

Download Print Version | Download XLSX

Table 2Dataset files technical specifications.

Download Print Version | Download XLSX

3 Catchment morphology and river network

3.1 General information

Morphology and channel shape information is essential for the computation of snow melting, temperature scaling and river routing. Alternatively, standard deviation of elevation and other orographic sub-grid parameters are critical for radiation parameterisation, especially for the shadowing effect. Channel geometry fields are needed to describe overbank inundation and infer inundated areas in wetland methane and soil carbon modelling. Land morphology is derived from elevation, and its variability within a single cell can be represented through slope, standard deviation, aspect, etc. River drainage information, derived from elevation, is used to connect the model cells according to the direction of the surface runoff, with channel geometry information used for routing processes.

The dataset contains 14 morphology and river network variables (names in brackets in italics correspond to the field names in the data repository).

  • Morphologic information: local drainage direction (i.e. flow direction from one cell to another; LDD, dimensionless), upstream drainage area (upArea, m2), grid cell area (pixarea, m2), grid cell length (pixlength, m), standard deviation of elevation (elvstd, m), gradient (i.e. elevation gradient; gradient, m m−1);

  • Kinematic wave equation for routing: channel bottom width (chanbw, m), channel length (chanlength, m), channel gradient (changrad, m m−1), Manning's roughness coefficient for channels (chanman, s m-1/3);

  • River network information: channel mask (i.e. presence of river channel; chan, dimensionless), channel side slope (i.e. channel's horizontal distance divided by vertical distance; chans, m m−1);

  • Open water evaporation: bankfull channel depth (chanbnkf, m), channel flood plain (i.e. width of the area where the surplus of water is distributed when the water level in the channel exceeds the channel depth; chanflpn, m).

3.2 Reference data and methodology

Environmental models require an accurate description of terrain and hydro-morphology to represent the hydrodynamics at the spatial resolution of the model. Here all catchment morphology and river network fields are derived from (i) the Catchment-based Macro-scale Floodplain (CaMa-Flood) Global River Hydrodynamics Model v4.0 maps (further referred to as CaMa-Flood) and (ii) the MERIT DEM: Multi-Error-Removed Improved-Terrain digital elevation model v.1.0.3 (further referred to as MERIT DEM). For reference data details, see Appendix A. All fields follow a complex sequential workflow (see Fig. 3 and Table 3). Note that whilst some river network fields were already directly available from the CaMa-Flood catalogue (e.g. LDD, channel length), they had to be adapted to the specific requirements of LISFLOOD. Fields also had to be specifically consistent with an interconnected river network described by the D8 algorithm (O'Callaghan and Mark, 1984; Fig. 2a) which is different to that used by the CaMa-Flood algorithm.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f03

Figure 3Workflow of complex manipulations to create some of the morphology and river network fields; solid arrows indicate a function transformation, whereas dashed arrows indicate modification of existing input data to LISFLOOD specifications.

Download

Table 3Morphology and river network fields, their description, data source and applied transformation; * denotes transformation following Burek et al. (2014); name in brackets in italics next to each field corresponds to the name in the data repository.

Download Print Version | Download XLSX

3.3 Regional examples

Most fields in the catchment morphology and river network category are quite technical and hard to interpret. The ones that can be easily digested are upstream area and standard deviation of elevation, which are presented in Fig. 4 for the Po River area at 1 and 3 arcmin resolutions and in Fig. 5 for the Amazon River and Brahmaputra river areas at 3 arcmin resolution. The field of standard deviation of elevation shows high level of detail over the Brahmaputra river and the benefit of the high-resolution dataset is clearly seen over the Po River.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f04

Figure 4Upstream drainage area in square metres (upper row, panels a and b) and standard deviation of elevation in metres (lower row, panels c and d) at 1 arcmin ( 1.9 km at the Equator, left column, panels a and c) and 3 arcmin ( 5.6 km at the Equator, right column, panels b and d) resolution for the Po River area in Italy.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f05

Figure 5Upstream drainage area in square metres (upper row, panels a and b) and standard deviation of elevation in metres (lower row, panels c and d) at 3 arcmin ( 5.6 km at the Equator) resolution for the Amazon River area (left column, panels a and c) and Brahmaputra river area (right column, panels b and d).

4 Land use fields

4.1 General information

Land use is an essential component of environmental models. Many models use a sub-grid cell approach where a single grid cell can include several different land uses with each land use being subject to different prominent physical processes. This approach allows us to keep a high level of accuracy when representing how different types of land cover affect (Balsamo, 2013), for example, the hydrological cycle (e.g. evaporation is different in urban areas compared to forests) while limiting the increase in computational time. Application of land surface fractions includes grid-cell-weighted average skin temperature calculations, biogenic flux calculations, urban planning and climate mitigation plan preparation. For example, sealed surface fraction is necessary for carbon budget calculations and trace gas emissions in general, more explicitly for anthropogenic and residential emission calculations. Irrigated crop and irrigated rice fractions (combined with rice planting and harvesting days) are useful for crop yield and methane emissions modelling.

The dataset differentiates between six different land uses (names in brackets in italics correspond to the field names in the data repository):

  • Forest: areas where the main hydrological processes are canopy interception, evapotranspiration from canopies, canopies drainage and evapotranspiration, root uptake, and evaporation from the soil (fraction of forest; fracforest, dimensionless fraction);

  • Sealed surface: impervious areas where there is no water infiltration into the soil; that is, water is accumulated in the surface depression, yet evaporates. Once the depression is full, water is transported by a surface runoff (fraction of sealed surface; fracsealed, dimensionless fraction);

  • Inland water: open water bodies where the most prominent hydrological process is evaporation (fraction of inland water; fracwater, dimensionless fraction);

  • Irrigated crops: areas used by agriculture – water is abstracted from ground water and surface water bodies to irrigate the fields. The main hydrological processes connected with the irrigated crops are canopy interception, evapotranspiration from canopies, canopies drainage and evapotranspiration, root uptake, and evaporation from the soil (fraction of all irrigated crops, excluding rice; fracirrigated, dimensionless fraction);

  • Irrigated rice: areas used to grow rice with the flooded irrigation agricultural technique, when water is abstracted from the inland water bodies and delivered to the rice fields. The main hydrological processes connected with rice fields are soil saturation, flooding, rice-growing phase, and soil drainage phase (fraction of irrigated rice; fracrice, dimensionless fraction);

  • Other land cover: used in canopy interception, evaporation from the canopies, canopy drainage, plant evapotranspiration and evaporation from the soil hydrological processes. The relative importance of these processes depends on the LAI (fraction of other cover types; fracother, dimensionless fraction).

4.2 Reference data and methodology

In models explicitly accounting for sub-grid variability, the fraction of each land use in every cell must be provided so that process representation for each land use can be weighted accordingly. Here, the majority of land use fields are derived from the Copernicus Global Land Service (CGLS) Land Cover (LC) 100 m map (further referred to as CGLS-LC100). Irrigated crops and irrigated rice fractions are derived from (i) the Spatial Production Allocation Model (SPAM), Global Spatially-Disaggregated Crop Production Statistics Data for 2010 v2.0 (further referred to as SPAM2010) and (ii) the Coordination of Information on the Environment (CORINE) Land Cover (CLC) inventory for 2018 (further referred to as CLC2018). For reference data details, see Appendix A. The derivation of fractions of the five land use classes used in LISFLOOD (and additional ocean fraction for consistency check) follows specific steps (see Fig. 6) summarised in Table 4. Note that LISFLOOD requires all “mask” (main model's field) grid cells to have at least one non-zero fraction type. Hence, the extra step in the generation of the inland water fraction field was to set empty grid cells (i.e. grid cells that based on the data source are fully covered with ocean) as fully covered with inland water.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f06

Figure 6Workflow of complex manipulations to create land use fields; solid arrows indicate a function transformation, whereas dotted arrows indicate upscaling; dashed boxes indicate the intermediate fields used for other field generation.

Download

Table 4Fraction of land use fields, their description, data source and applied transformations; “sum” refers to the sum of all fractions except “other land cover fraction”; cells with bold font show required intermediate fields; name in brackets in italics next to each field corresponds to the name in the data repository.

* For the sealed surface fraction, it is assumed that water can infiltrate in roughly 25 % of urban areas at kilometre scale through, for example, trees along the road, bushes along the fence, and grass or moss between concrete tiles or cobblestones.

Download Print Version | Download XLSX

To ensure consistency between fractions, the sum of all fraction fields must be 1 at any resolution. When the sum is greater than 1, the inland water fraction value is assumed correct (input data corrected prior computation over Fox Basin and Caspian Sea), and all other fractions are corrected (fracXX) following Eq. (1):

(1) fracXX = fracXX raw 1 - fracwater raw + fracocean raw + fracforest raw + fracsealed raw + fracirrigated raw + fracrice raw - 1 fracforest raw + fracsealed raw + fracirrigated raw + fracrice raw ,

where raw refers to the original (i.e. before consistency check) fraction of XX which can be the forest, irrigated crops, rice or sealed surfaces.

The generated fraction fields, e.g. forest (see Fig. 7a) and other land cover (see Fig. 7b), have generally good consistency with other up-to-date products like ESA CCI Land Cover time series v2.0.7 (ESA CCI map viewer https://maps.elie.ucl.ac.be/CCI/viewer/, last access: 18 June 2024; Defourny et al., 2017).

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f07

Figure 7Fraction of forest (left column, panel a) and fraction of other land cover (right column, panel b) at 3 arcmin ( 5.6 km at the Equator) resolution for the global region.

4.3 Regional examples

All fields in the land use category are easy to interpret as they represent the fraction of grid cell covered by one or another surface cover type. The most interesting ones are fraction of forest, fraction of inland water, fraction of irrigated crops and fraction of rice. These fractions are presented in Fig. 8 for the Po River area at 1 and 3 arcmin resolutions and in Fig. 9 for the Amazon River and Brahmaputra river areas at 3 arcmin resolution. Figures show a high level of detail visible for the fields of fraction of forest and fraction of inland water (e.g. Amazon River), especially at the highest spatial resolution (Po River).

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f08

Figure 8Fraction of forest (upper row, panels a and b), fraction of inland water (second row, panels c and d), fraction of irrigated crops (third row, panels e and f) and fraction of rice (lower row, panels g and h) at 1 arcmin ( 1.9 km at the Equator, left column, panels a, c, e and g) and 3 arcmin ( 5.6 km at the Equator, right column, panels b, d, f and h) resolution for the Po River area in Italy.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f09

Figure 9Fraction of forest (upper row, panels a and b), fraction of inland water (second row, panels c and d), fraction of irrigated crops (third row, panels e and f) and fraction of rice (lower row, panels g and f) at 3 arcmin ( 5.6 km at the Equator) resolution for the Amazon River area (left column, panels a, c, e and g) and Brahmaputra river area (right column, panels b, d, f and h).

5 Vegetation properties

5.1 General information

Vegetation-related information contributes to the computation of precipitation interception, evaporation, transpiration, and root water uptake. Depending on the model, vegetation dynamics can be represented with different degrees of complexity including in hydrology processes, vegetation growth and feedback on climate (Bonan et al., 2003). For rice, being the world's most important food crop and having specific water demands, its water cycle is often considered explicitly. Rice planting and harvesting dates, being critical information to represent the inter-annual variability in its water demand, provided the maximum of three growing seasons. The variables allow us to model how vegetation affects the hydrology, with a particular focus on root water uptake and transpiration depending on vegetation type and vegetation state (e.g. water stress conditions). For example, the crop group number depends on the critical amount of soil moisture below which water uptake from plants is reduced as they start closing their stomata. Alternative use of fields such as the leaf area index (LAI) includes biomass allocation, which can be used for fire danger forecasting and carbon stock monitoring. Rice planting/harvesting days are important for the yearly cycle of methane modelling.

The dataset describes vegetation properties through four variables (note that LAI consists in total of thirty-six 10 d average fields) for each of forest (_f), irrigated crops (_i), and other land cover types (_o) and another six (two types times three seasons) variables for rice (names in brackets in italics correspond to the field names in the data repository):

  • Transpiration rate: crop coefficient (cropcoef_f, cropcoef_i, cropcoef_o, dimensionless);

  • Water uptake: crop group number (cropgrpn_f, cropgrpn_i, cropgrpn_o, dimensionless);

  • Surface runoff generation and water routing: Manning's surface roughness coefficient (mannings_f, mannings_o, s m-1/3), rice planting and harvesting days (riceplantingday1, riceplantingday2, riceplantingday3, calendar day number; riceharvestday1, riceharvestday2, riceharvestday3, calendar day number);

  • Water interception and evaporation: leaf area index (laif, laii, laio, m2 m−2).

5.2 Reference data and methodology

In addition to the land use fraction, the distribution of vegetation type and characteristics is required to capture the difference in environmental processes, such as water intake of evaporation. Here the vegetation properties are derived from many data sources using maps to account for the species spatial distribution (i.e. CGLS-LC100 and SPAM2010) and tables to obtain associated hydro-dynamics properties for crops, e.g. (i) the Food and Agriculture Organisation (FAO) of the United Nations Irrigation and Drainage Paper No. 56 (further referred to as FAO56), (ii) the WOFOST 6.0 crop simulation model description (further referred to as SUPIT), and (iii) for river hydraulics the Open-Channel Hydraulics manual (further referred to as CHOW). Time evolution of vegetation is based on the Copernicus Global Land Service (CGLS) Leaf Area Index (LAI) 1 km Version 2 collection (further referred to as CGLS-LAI); time evolution of crops is based on the RiceAtlas v3 (further referred to as RiceAtlas). For reference data details, see Appendix A. This requires assumptions to be made in case different sources do not contain the same information and transformations need to be applied, depending on the vegetation type. The main data sources and general transformation steps (see Fig. 10) to derive the 18 vegetation property fields are summarised in Table 5 and the following text. Note that the “crop group number” variable corresponds to a water depletion value and can be averaged across different crop types.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f10

Figure 10Workflow of complex manipulations to create some of the vegetation property fields, e.g. crop coefficient (left column, upper row, panel a), Manning's surface roughness coefficient (right column, upper row, panel b), crop group number (left column, lower row, panel c), and root depth (right column, lower row, panel d); solid arrows indicate a function transformation, whereas dotted arrows indicate upscaling; dashed boxes indicate the intermediate fields used for other field generation, whereas dotted boxes indicate the fields only used for the vegetation-related fields.

Download

Table 5Vegetation property fields, their description, data source and applied transformations; cells with bold font show required intermediate fields; name in brackets in italics next to each field corresponds to the name in the data repository.

Download XLSX

The final step of the crop coefficient; crop group number; Manning's surface roughness coefficient; and additional crop height (for crop coefficient calculation) and root depth (for soil depth calculation, see Sect. 6.2) for forest, irrigated, crops, and other land cover types is to compute weighted averages of their components (e.g. different forest types) following Eq. (2):

(2) K = A 1 f r 1 K 1 + A 2 f r 2 K 2 + + A N f r N K N A 1 f r 1 + A 2 f r 2 + + A N f r N ,

where A is a scaling parameter (equals 1, except for crop coefficient where it equals to crop height); fr refers to the fraction of crop or land cover type; K refers to the default (i.e. source-table-based) variable in question, values 1 … N represent the number of crop or land cover types included in the field (i.e. N= 12 for forest, N= 41 for irrigated crops, N= 7 for other land cover types; for CGLS-LC100-type “40” (cropland), default values are based on 42 rainfed crops).

The generated vegetation property fields, e.g. crop coefficient for forest (see Fig. 11a) and other land covers (see Fig. 11b), follow the main features of, for example, generated forest fraction.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f11

Figure 11Crop coefficient for forest (left column, panel a) and crop coefficient for other land cover types (right column, panel b) at 3 arcmin ( 5.6 km at the Equator) resolution for the global region.

5.3 Regional examples

All fields in the vegetation properties category are complementary to the land use fractions and help to understand for example the difference in evaporation water intake. The fields easiest to interpret are the crop coefficient and the crop group number, which are presented for forest in Fig. 12 for the Po River area at 1 and 3 arcmin resolutions and in Fig. 13 for the Amazon River and Brahmaputra river areas at 3 arcmin resolution. For example, fields of crop group number for forest (i.e. different forest types) show a transition of vegetation resilience towards dry conditions in the Brahmaputra river area.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f12

Figure 12Crop coefficient for forest (upper row, panels a and b) and crop group number for forest (lower row, panels c and d) at 1 arcmin ( 1.9 km at the Equator, left column, panels a and c) and 3 arcmin ( 5.6 km at the Equator, right column, panels b and d) resolutions for the Po River area in Italy.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f13

Figure 13Crop coefficient for forest (upper row, panels a and b) and crop group number for forest (lower row, panels c and d) at 3 arcmin ( 5.6 km at the Equator) resolution for the Amazon River area (left column, panels a and c) and Brahmaputra river area (right column, panels b and d).

6 Soil properties

6.1 General information

In land surface and distributed hydrological models, the water movement, storage and uptake from the soil are often described by the soil water retention curve (SWRC). The SWRC is derived empirically by measuring how water is retained and released by different soil types. Throughout time, different SWRCs have been developed and integrated into models. The most widely applied SWRCs are from Brooks and Corey (Brooks and Corey, 1964), Fredlund and Xing (Fredlund and Xing, 1994), van Genuchten (van Genuchten, 1980), and Gardner (Gardner, 1956). Different SWRC equations require different parameters, some shared between different SWRC concepts, e.g. referring to physical soil characteristics such as water saturated and unsaturated content, hydraulic conductivity and pore size, while others are uniquely describing the SWRC function shape and not directly related to soil properties. Often, for computational reasons, the soil profile from ground level to bedrock depth is sliced into layers, at the modeller's choice, and the SWRC function is applied to each soil layer. An alternative use of soil properties is for soil moisture calculations.

The dataset includes variables required to apply the van Genuchten SWRC equations (van Genuchten, 1980) to describe the water dynamics through a vertical soil profile composed of three layers (1, 2, 3). Each variable is required for each soil layer and for forest (_f) or non-forest (_o) land uses, with different soil depths in forest (_f) and non-forest (_o) areas, following root depth values from Allen at al. (1998), referred to as FAO56 (total of 29 variables; names in brackets in italics correspond to the field names in the data repository):

  • Soil profile: surface layer depth (soildepth1_f, soildepth1_o, mm), middle layer depth (soildepth2_f, soildepth2_o, mm), subsoil depth (soildepth3_f, soildepth3_o, mm);

  • Soil hydraulic properties: saturated (thetas1_f, thetas1_o, thetas2_f, thetas2_o, thetas3, m3 m−3) and residual (thetar1, thetar2, thetar3, m3 m−3) volumetric soil moisture content, pore size index (lambda1_f, lambda1_o, lambda2_f, lambda2_o, lambda3, dimensionless), van Genuchten equation parameter (genua1_f, genua1_o, genua2_f, genua2_o, genua3, cm−1), saturated soil conductivity (ksat1_f, ksat1_o, ksat2_f, ksat2_o, ksat3, mm d−1).

6.2 Reference data and methodology

Soil proprieties are derived from the International Soil Reference and Information Centre (ISRIC) SoilGrids250m global gridded soil information release 2017 (further referred to as SoilGrids250m). For reference data details, see Appendix A. Soil proprieties are computed for both forested and non-forested (also known in literature as “others”) areas, expressed as fractions (main source is forest fraction based on CGLS-LC100; see Section 4.2), where non-forested area is the complementary fraction of forest. Soil depth layers are derived first and used as input to the soil hydraulic equations used to derive the properties, following a sequential workflow (see Fig. 14 and Table 6). Equations used are from Tóth et al. (2015).

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f14

Figure 14Workflow to generate the soil-related fields; solid arrows indicate a function transformation, whereas dotted arrows indicate upscaling; dashed boxes indicate the intermediate fields used for other field generation, whereas dotted boxes indicate the fields only used for the soil-related fields; “SoilGrids250m depths” represents fields at the SoilGrids250m native grid and resolution with six default depths, “final grid and resolution” represents fields at the dataset's final grid and resolution; boxes with no explicit indication represent fields at SoilGrids250m native grid and resolution only.

Download

Table 6Soil property fields, their description and applied transformations; name in brackets in italics next to each field corresponds to the name in the data repository.

Download Print Version | Download XLSX

Two of the most common soil parameters of land surface and hydrological models, saturated hydraulic conductivity (ksat) and saturated water content, are shown in Fig. 15.

Saturated hydraulic conductivity (ksat; see Fig. 15a) ranges from 2 to 7445 mm d−1. The highest ksat values are concentrated in desert areas such as the Sahara, Arabian Peninsula, Gobi, Patagonian, Sonoran–Mojave, and Kalahari and Namib deserts. Low ksat values between 2 and 18 mm d−1 are found in the Amazon river basin, the lower Mississippi River basin and South East Asia; ksat was visually compared against eight global datasets developed with different input data and/or pedotransfer functions (PTFs) (Zhang and Schaap, 2019; Gupta et al., 2021); a general agreement is noticeable in areas that show low variability across all datasets. Northern Russia, Canada, South East Asia and the Sonoran–Mojave desert are the areas with high variability among datasets, with values ranging from very low to very high ksat. Source of uncertainties in ksat values are primarily due to little availability of soil samples and measurements carried out in those areas. Moreover, the climatic context plays a relevant role in clay mineralogy composition, organic composition and soil pores structure (Hodnett and Tomasella, 2002), which influence how water flows through the soil. Therefore, the PTF developed using soil samples collected in temperate areas (such as Europe) are expected to have a different hydraulic behaviour compared to those collected in tropical climates (Gupta et al., 2021), as also seen in Fig. 15a.

Saturated water content (see Fig. 15b) ranges between 0.27 to 0.79, with 80 % of values between 0.40 and 0.46. A comparison with other global datasets was not carried out; however, uncertainties are expected to be of the same order of magnitude as those of ksat, given the fact that the saturated water content is calculated using bulk density and clay content data.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f15

Figure 15Saturated soil hydraulic conductivity for forested areas of soil depth layer 2 (in mm d−1) (left column, panel a) and saturated volumetric soil moisture (i.e. water) content for forested areas of soil depth layer 2 (right column, panel b) at 3 arcmin ( 5.6 km at the Equator) resolution for the global region.

6.3 Regional examples

The majority of soil property fields are easy to interpret. Saturated soil conductivity, ksat, and saturated volumetric soil moisture content are presented for forested areas of soil depth layer 2 in Fig. 16 for the Po River area at 1 and 3 arcmin resolutions and in Fig. 17 for the Amazon River and the Brahmaputra river areas at 3 arcmin resolution. The field of saturated soil conductivity for forest shows how easy it is for water to penetrate soil, depending on forest type. The field of saturated volumetric soil moisture content shows what is the maximum amount of water that the soil can absorb, depending on forest type. These fields have interesting features over Brahmaputra river area.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f16

Figure 16Saturated soil hydraulic conductivity for forested areas of soil depth layer 2 (in mm d−1) (upper row, panels a and b) and saturated volumetric soil moisture (i.e. water) content for forested areas of soil depth layer 2 (lower row, panels c and d) at 1 arcmin ( 1.9 km at the Equator, left column, panels a and c) and 3 arcmin ( 5.6 km at the Equator, right column, panels b and d) resolution for the Po River area in Italy.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f17

Figure 17Saturated soil hydraulic conductivity for forested areas of soil depth layer 2 (in mm d−1) (upper row, panels a and b) and saturated volumetric soil moisture (i.e. water) content for forested areas of soil depth layer 2 (lower row, panels c and d) at 3 arcmin ( 5.6 km at the Equator) resolution for the Amazon River area (left column, panels a and c) and Brahmaputra river area (right column, panels b and d).

7 Lakes

7.1 General information

Lakes (and reservoirs) are important as they influence river discharge variability but also the atmosphere regionally and globally. The area covered by lakes can be used for computing evaporation from open water, freshwater storage, unregulated surface water extent, fresh water scarcity indexes and biogenic greenhouse gas emission, as well as for reproducing different climate mitigation scenarios. The CEMS_SurfaceFields_2022 dataset only includes data on lake extent and not reservoirs (generally smaller). Lake mask describes the presence of lakes and is consistent with fraction of inland water. The field's name in the data repository is lakemask, dimensionless.

7.2 Reference data and methodology

The lake mask field is derived from the Global Lakes and Wetlands Database (further referred to as GLWD). For reference data details, see Appendix A; for workflow, see Table 7.

Table 7Lake field, its description, data source and transformation; name in brackets in italics next to the lake field corresponds to the name in the data repository.

Download Print Version | Download XLSX

7.3 Regional examples

The lake mask field is easy to interpret as it shows which grid cells from fraction of inland water field have lakes. The lake mask field is presented in Fig. 18 for the Po River area at 1 and 3 arcmin resolution, and in Fig. 19 for the Amazon River and Brahmaputra river areas at 3 arcmin resolution. Figures show the abundance of lakes over Amazon River area and detailed lake shapes over Po River area described by the 1 arcmin resolution field.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f18

Figure 18Lake mask at 1 arcmin ( 1.9 km at the Equator, left column, panel a) and 3 arcmin ( 5.6 km at the Equator, right column, panel b) resolution for the Po River area in Italy.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f19

Figure 19Lake mask at 3 arcmin ( 5.6 km at the Equator) resolution for the Amazon River area (left column, panel a) and Brahmaputra river area (right column, panel b).

8 Water demand

8.1 General information

Some environmental models explicitly represent the number of human interventions impacting on the water cycle. One of the most common is water demand, which represents the withdrawal of water from natural water sources (e.g. rivers, reservoirs, groundwater) to satisfy the water demand for anthropogenic use. The segregation of the total water demand for anthropogenic use into four main sectors, namely domestic, energy, industrial and livestock water withdrawal, enables a more accurate representation of the processes and follows the Food and Agriculture Organisation of the United Nations (FAO) terminology (Kohli et al., 2012). Domestic water withdrawal represents indoor and outdoor household water use as well as other uses (e.g. industrial and urban agriculture) connected to the municipal system (e.g. water use by shops, schools and public buildings). Electricity (energy) water withdrawal is the water use for the cooling of thermoelectric and nuclear power plants. Water withdrawal for industry is the water used for fabricating, processing, washing, cooling or transporting products and also includes water within the final products and water used for sanitation within the manufacturing facility. Livestock withdrawal is the demand for drinking and cleaning purposes of livestock.

Higher accuracy in environmental modelling is achieved by differentiating water demand sources and by allocating different levels of priority to different usages. Within LISFLOOD, for instance, water demand for the energy sector and flooded irrigation (rice crops) is supplied by surface water bodies only. Non-flooded irrigation, domestic, industrial and livestock water demands can be supplied by both groundwater and surface water bodies. Moreover, the domestic water demand has the highest priority in case of water scarcity conditions.

It must be noted that the fields of water demand for agriculture are not included in this dataset because LISFLOOD computes crop water demand internally by accounting for climatic conditions, information on land cover (see Sect. 4.2), crops properties (see Sect. 5.2) and soil properties (see Sect. 6.2). Conversely, fields representing the volume of water to satisfy the domestic, energy, industrial and livestock demands must be provided as input. Domestic, industrial, energy and livestock water demand volumes have seasonal (e.g. due to temperature differences) and inter-annual variations (e.g. due to population changes and different economic conditions). In order to account for this variability, in LISFLOOD the four sectoral water demand fields provide daily water demand data with monthly or annual variability from 1 January 1979 to 31 December 2019. The water demand values are provided in millimetres per day (mm d−1), with one field per month (the first day of each month is used as representative timestamp for the entire month) for domestic and energy demand and one value per year (the monthly fields are repeated 12 times per year) for industrial and livestock demand.

Water availability, ecosystem long-term ecological status and anthropogenic needs must be accounted for to evaluate the long-term sustainability of water withdrawals. However, the spatial scales of water use data and available water resources data often do not match due to different ways of data surveying and/or modelling (McManamay et al., 2021; Zhang et al., 2023), and this creates a technical hurdle. Alternative use of the gridded sectoral water demand information is for (i) the statistical analysis of long-term spatiotemporal patterns and trends of water demand, (ii) the evaluation of the long-term sustainability and impacts of water withdrawals (e.g. in connection to remote sensing-derived datasets of surface water extent or groundwater total storage), (iii) the analysis of ecosystem–water–food–energy nexus (Karabulut et al., 2016), (iv) the evaluation of the impacts on water resources of economical and price policies (Dolan et al., 2021), and (v) the analysis of the responses in sectoral water use during hydroclimatic extremes (Belleza et al., 2023).

The CEMS_SurfaceFields_2022 dataset includes water demand for four main sectors (note that each sector consists in total of 12 daily water demand fields per 41 (1979–2019) years, so 492 fields per sector) (names in brackets in italics correspond to the field names in the data repository): livestock (liv, mm d−1), industry (ind, mm d−1), energy production, (ene, mm d−1) and domestic use (dom, mm d−1). The temporal extension of the water demand fields presented in this paper includes the most recent information of water demand at the time of the dataset's preparation. Readers that are interested in using more recent water demand data are invited to follow the protocol presented in Sect. 8.2 to further extend in time the provided fields.

8.2 Reference data and methodology

Global gridded water demand fields with monthly variability were generated for the four sectors using the main data sources listed here and following the transformations summarised in Table 8 (for additional information and extra details, see the GitHub repository https://github.com/ec-jrc/lisflood-utilities/tree/master/src/lisfloodutilities/water-demand-historic last access: 18 June 2024): (i) AQUASTAT, (ii) United States Geological Survey National Water Information System (further referred to as USGS NWIS), (iii) Global Change Analysis Model (further referred to as GCAM), (iv) the Gridded Livestock of the World (GLW) version3 (further referred to as GLW3), and (v) the Global Human Settlement Population Grid multitemporal version R2019A (further referred to as GHS-POP). For the full list of reference data and details, see Appendix A.

The water demand values are provided in mm d−1 and one field per month from 1 January 1979 to 31 December 2019 (the first day of each month is used as the representative timestamp for the entire month). The methodology applied largely follows Huang et al. (2018), with the key differences being the use of freely available datasets and the higher resolution of the resulting fields. Spatial downscaling was achieved following the approach by Hejazi et al. (2014); temporal downscaling was performed following the approaches by Wada et al. (2011), Voisin et al. (2013) and Huang et al. (2018). It should be noted that country-scale estimates (from AQUASTAT) were integrated with state-level water withdrawal estimates (from USGS NWIS). The protocol for the integration of local information with global data sources was developed for further use in the future to enable the integration of other regional or national datasets as soon as they become available.

Table 8Water demand fields, their description, data source and applied transformations; cells with bold font show required intermediate fields; name in brackets in italics next to each field corresponds to the name in the data repository.

Download XLSX

To the best of the authors' knowledge, no other publicly accessible temporally varying global water demand field set exists (only static datasets). A rigorous validation of the temporally varying water demand fields is not straightforward at the global scale, as the only comprehensive global data source, FAO AQUASTAT, was used to create the fields.

8.3 Regional examples

In general, fields in the water demand category are easy to interpret as they show how much water per day is needed to satisfy certain types of human-induced needs. In reality, water demand fields are mainly covering urbanised areas and are scattered around (i.e. not continuously looking field), with relatively small variations in field values from month to month. An example for domestic water use is presented for August 2018 in Fig. 20 for the Po River area at 1 and 3 arcmin resolutions and in Fig. 21 for the Amazon River and Brahmaputra river areas at 3 arcmin resolution.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f20

Figure 20Water demand for domestic use in mm d−1 at 1 arcmin ( 1.9 km at the Equator, left column, panel a) and 3 arcmin ( 5.6 km at the Equator, right column, panel b) resolution for the Po River area in Italy.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f21

Figure 21Water demand for domestic use in mm d−1 at 3 arcmin ( 5.6 km at the Equator) resolution for the Amazon River area (left column, panel a) and Brahmaputra river area (right column, panel b).

Download

9 Data, access, licensing and documentation

CEMS_SurfaceFields_2022 is an open-source dataset of the Copernicus Emergency Management Service describing key components of the Earth surface generally required in environmental and hydrological modelling, including Earth system modelling and numerical weather prediction. The dataset includes static fields (e.g. forest fraction), yearly cycle fields (e.g. 10 d average LAI, in total 36 fields) and yearly varying fields (e.g. water demand). The surface fields are based on 25 different sources, including global and regional high-resolution (up to 100 m) gridded and vector datasets. They were processed into two sets of fields (i) at 1 arcmin resolution ( 1.86 km at the Equator) over Europe (72.25° N/22.75° N, 25.25° W/50.25° E; 4530×2970 grid cells) and (ii) at 3 arcmin resolution ( 5.57 km at the Equator) over the globe (90.00° N/90.00° S, 180.00° W/180.00° E; 7200×3600 grid cells) to provide an up-to-date surface state for six main field groups: (1) catchment morphology and river network, (2) land use fields, (3) vegetation properties, (4) soil properties, (5) lakes and (6) water demand.

The CEMS_SurfaceFields_2022 dataset consists, in total, of 140 gridded fields at EPSG:4326 – WGS84: World Geodetic System projection in NetCDF format with information on Earth's surface state (see Table 9 for the full list of fields), which are grouped thematically into sub-folders. The 1 arcmin European fields have a total memory storage volume of 9.3 GB, and the 3 arcmin global fields have a total volume of 22.7 GB. The CEMS_SurfaceFields_2022 dataset is freely available for download from the Joint Research Centre (JRC) Data Catalogue (https://data.jrc.ec.europa.eu/, last access: 18 June 2024). The set of global surface fields at 3 arcmin resolution can be found here (JRC Data Catalogue – LISFLOOD static and parameter maps for GloFAS – European Commission (europa.eu), https://data.jrc.ec.europa.eu/dataset/68050d73-9c06-499c-a441-dc5053cb0c86, last access: 18 June 2024), and the set of surface fields for the European domain at 1 arcmin resolution can be found here (JRC Data Catalogue – LISFLOOD static and parameter maps for Europe – European Commission (europa.eu), https://data.jrc.ec.europa.eu/dataset, last access: 18 June 2024). The README.txt file that can be found there contains the basic description of each surface field, including general information, data description, file overview, methodological information, and data access and sharing information. For a detailed technical description of how the surface fields were generated, refer to the LISFLOOD user guide, available online: https://ec-jrc.github.io/lisflood-code/4_Static-Maps-introduction/, last access: 18 June 2024. The changelog.txt file provides users with information on updates to the datasets. The copyright.txt file provides information about the data licence (CC BY 4.0).

Table 9Full list of surface fields with short description and units included in CEMS_SurfaceFields_2022 dataset; name in italics corresponds to the field's file name in the data repository.

Download Print Version | Download XLSX

Whilst the CEMS_SurfaceFields_2022 dataset followed the strict requirements of the LISFLOOD-OS model (e.g. format, treatment of missing values, number of soil layers, etc.) it definitely can be used outside the LISFLOOD context, using the full dataset or its parts, for applications such as modelling risk assessment. The workflow and methodology used to generate the dataset and published in this paper can be used as reference and can be easily modified if further adaptation to the dataset is needed (e.g. using different sets of equations to describe the soil properties or sourcing new or more relevant local datasets).

10 Conclusion

The Earth's surface has a strong impact on the surface energy and water balance that drives lower-atmosphere weather conditions and river discharge fluctuations. Depending on the surface type (e.g. land use, terrain or soil), weather in the region can be colder/warmer, more/less humid, drier/rainier, and/or calmer/windier than its surroundings. Depending on the surface type, the terrestrial water cycle can differ, with water infiltrating more/less in the soil, leaving as evaporation at a higher/lower rate, and reaching rivers faster/slower. Surface information is provided by land use and ecosystem type (e.g. forest, rice paddy, bare ground, urban), river geometry (e.g. channel width, channel length), and soil properties (e.g. depth, porosity, hydraulic properties), amongst others.

Information of underlying surface fields can be accounted for in Earth system and environmental models (e.g. atmospheric, hydrological, etc.) to simulate the evolution in space and time of water, energy and carbon cycles. If artificial influences and human intervention are included within the modelled processes (e.g. irrigation or water management through reservoirs), the information required to describe the processes must also be integrated within the modelling framework. Generally, this is achieved through a set of independent files used as input to the models.

Because of the temporal non-stationarity of some surface fields, typically associated with human intervention such as land use and water use but also due to climatic variation such as lake extent (new lakes forming or lakes shrinking), input surface fields must be as representative as possible to the simulated period of interest. For medium-range forecasting systems, this should be as close to present as possible, for example. When simulating long periods, especially looking at past or future decades, caution must be used with the results. This is especially true if some surface fields which have substantially changed during the simulation period do not explicitly incorporate time and instead are based on the most recent period. The most recent period may not be representative to the full study period and can introduce substantial biases that grow with time. The same is applicable if surface fields are used for collecting statistical data in general, as statistics based on stationary fields only represent the period used to generate the stationary field in question.

In addition, in recent years the horizontal resolution of global Earth system and environmental models has been constantly increasing, reaching the kilometre-scale milestone. This has been supported by the technological developments in the field of high-performance computing and the wealth of high-resolution datasets freely available. This imposes another condition to the input surface fields – fields must be of rather high horizontal resolution (i.e.  2 and 6 km at the Equator).

Thanks to the availability of a wide range of high-resolution environmental data derived from the use of ground, unconventional and satellite measurement sensors, new high-resolution datasets describing the Earth's surface are nowadays released regularly. Even though each dataset may have a very low absolute and root mean square errors when compared with available independent data, merging different datasets for modelling purposes (e.g. to model hydrological surface parameters) might lead to questionable results and even to a model crash, due to possible discontinuity or inconsistency in the combined datasets. In the specific case of hydrological modelling where river flow is also represented, high horizontal resolution does not guarantee better modelling per se. Sources of potentially large errors can be easily hidden in high-resolution datasets. This is the case, for instance, for errors in the digital elevation models when they are used to obtain river drainage networks. Small errors in the elevation of a grid cell can lead to a totally inaccurate representation of the location and the direction in which the river is flowing in the model compared to reality. Mislocating a river or having a slightly inaccurate catchment area can represent a trivial inaccuracy for most applications, but it can also lead to missed flood warning for thousands of people within a flood-awareness system. To benefit from different recent high-resolution datasets based on satellite and ground measurements, it is essential that a well-defined, thorough workflow is designed and implemented so that the final products are consistent and compatible with each other and can be used in combination.

The work presented in this paper is focused not only on the final surface field generation (i.e. CEMS_SurfaceFields_2022), but also on deriving a robust reproducible methodology that could be reapplied once new versions of 25 or less input sources are released. Understanding of the methodology applied helps to interpret values in the final surface fields and possibly even numerical model results that use these surface fields. The collection of input sources and their preparation for actual use is a very important step as it includes going through all technical documentation, comparison and verification of papers, and the investigation of the actual data, as well as data gridding, interpolation, and scaling. All input sources for CEMS_SurfaceFields_2022 are ranked according to their quality and up-to-date in order to favour one value in ambiguous situations when several datasets provide different information for the same location. Consistency checks between all surface-type fractions are carried out to address the issue of ambiguity during the merging of information from different origins (i.e. adjust fractions to sum to one in each grid cell). Some fields, like forest fraction, were rather straightforward to create from available source, yet it was noted that prior correction of the source was needed to delete erroneous forest grid cells from Fox Basin in Canada (the mismatch was only spotted during the investigation of the actual data, as it was absent from the documentation). Other fields, like soil hydraulic properties, are created not only from the source information, but also from the forest fraction that had to be generated prior. The soil hydraulic property methodology also includes several steps that have to be performed at the native resolution (i.e. 250 m) using information from several global fields simultaneously, which becomes technically and computationally challenging. Surface fields with clear multi-annual changes, like water demand maps, are created using temporal interpolation and extrapolation from multiple data sources to create time series fields. A final and non-trivial task is to have all resulting fields on an identical grid without deterioration of the actual value precision, even after several file type translations (e.g. local drainage direction field can be automatically checked and corrected if needed for required boundaries only in PCRaster format, not NetCDF). Due to the number of data sources and surface fields required to represent the main variables (i.e. 70) used in Earth system and environmental models, the overall effort to generate the CEMS_SurfaceFields_2022 dataset (both human and computing resources) was substantial.

The CEMS_SurfaceFields_2022 dataset is a new data source open to all offering a kilometre-scale resolution of high-quality data describing the Earth's surface, providing an exceptional opportunity for the research and scientific community to extend and multiply European and global applications in wide-ranging fields of the water–energy–food nexus. The CEMS_SurfaceFields_2022 surface fields use can be vast; here are only few of them. Standard deviation values of elevation and other orographic sub-grid parameters are critical for radiation parameterisation, especially for the shadowing effect. Channel geometry fields are vital to describe overbank inundation and infer inundated areas in wetland methane and soil carbon modelling. Land use fractions are needed for skin temperature calculations, biogenic flux calculations, urban planning and climate mitigation plan preparation. LAI use includes biomass allocation, which can be used for fire danger forecasting and carbon stock monitoring. Rice planting/harvesting days are important for the yearly cycle of methane modelling. Soil properties are used for soil moisture calculations. The area covered by lakes can be used for computing evaporation from open water, freshwater storage, unregulated surface water extent, fresh water scarcity indexes and biogenic greenhouse gas emission, as well as for reproducing different climate mitigation scenarios. All of the above state that CEMS_SurfaceFields_2022 surface fields can be used for weather prediction, Earth system modelling, hydrological and environmental modelling, or statistical analysis in general, with a spatial scale allowing for global, regional, and even national applications.

Appendix A

All data sources used to produce the dataset's surface fields, mentioned in Sects. 3 to 9, are described here. All data considered were open source, freely available, updated as recently as possible and with recognised references for their quality.

A1 Catchment morphology and river network

The MERIT DEM: Multi-Error-Removed Improved-Terrain digital elevation model v.1.0.3 (15 October 2018) (further referred to as MERIT DEM) is a high accuracy global DEM at 3 arcsec resolution ( 90 m at the Equator) covering land area from 90° N to 60° S, selected for its ability to clearly represent landscapes such as river networks and hill–valley structures even in flat areas where height errors could be larger than topography variability (Yamazaki et al., 2017; Bhardwaj, 2021; Chai et al., 2022). It is derived from seven different open-source datasets, delivered as 57 GeoTIFF files, 30° by 30° region each, at  90 m resolution (in total 90.0 GB), representative of the year 2018. More details on the method, data content and access can be found in Yamazaki et al. (2017) and the MERIT DEM webpage (http://hydro.iis.u-tokyo.ac.jp/~yamadai/MERIT_DEM, last access: 18 June 2024).

The MERIT DEM was used to compute standard deviation of elevation, gradient and channel geometry fields.

The Catchment-based Macro-scale Floodplain (CaMa-Flood) Global River Hydrodynamics Model v4.0 maps (further referred to as CaMa-Flood) are used for the basic maps describing all physical properties of the river network. It is derived from MERIT Hydro (MERIT Hydro is a global hydrography dataset, created by using elevation (i.e. MERIT DEM) and several inland water maps); more details can be found in Yamazaki et al. (2019) and the MERIT Hydro webpage (http://hydro.iis.u-tokyo.ac.jp/~yamadai/MERIT_Hydro, last access: 18 June 2024) and for high-resolution river routing applications using the FLOW algorithm (Yamazaki et al., 2009, 2011). The maps include information on channel length, river topography parameters, floodplain elevation profile, channel width and channel depth. The maps exist at 15, 6, 5, 3 and 1 arcmin resolutions covering land area from 90° N to 60° S, representative of the year 2017, and for each resolution they are available as one single file with all variables in NetCDF format (for 1 arcmin 737.0 MB). More details on the method, data content and access can be found in Yamazaki et al. (2011) and CaMa-Flood webpage (http://hydro.iis.u-tokyo.ac.jp/~yamadai/cama-flood/index.html, last access: 18 June 2024). Note that whilst the CaMa-Flood maps were originally generated for the specific use of the CaMa-Flood model, they can also serve as a basis to derive alternative maps for other environmental models, as done here.

The CaMa-Flood maps were used to create the local drainage direction (LDD), upstream drainage area, channel geometry and land masks fields.

A2 Land use fields

The Copernicus Global Land Service (CGLS) Land Cover (LC) 100 m map (further referred to as CGLS-LC100) is a global land cover map of the year 2015 (Buchhorn et al., 2020). It is derived from the PROBA-V 100 m satellite image collection, a database of high-quality land cover training sites and ancillary datasets, reaching an accuracy of 80 % at Level 1 (Buchhorn et al., 2021). It contains 23 classes for discrete classification and 10 classes for continuous cover fractions; it is delivered as 15 files in GeoTIFF format (in total 39.3 GB) at 100 m resolution covering land area from 90° N to 60° S and representative of the year 2015. More details on the method, data content and access can be found in Buchhorn et al. (2021) and the Copernicus website (https://land.copernicus.eu/global/products/lc, last access: 18 June 2024).

The CGLS-LC100 was used to generate crop parameters and Manning's surface roughness coefficient for forest and other land cover types, to generate forest, inland water and sealed surface fraction fields, following a basic quality check on large water bodies (i.e. correcting Fox Basin and Caspian Sea).

The Coordination of Information on the Environment (CORINE) Land Cover (CLC) inventory for 2018 (further referred to as CLC2018) is a set of maps describing the land cover/land use status of 2018 covering 39 countries in Europe with a total area of over 5.8×106 km2. The dataset is derived from satellite imagery (mainly Sentinel-2, based on a constellation of two satellites orbiting Earth at an altitude of 786 km (180° apart) revisiting the Equator every 5 d, and for gap filling Landsat-8 data, making a constellation together with the Landsat-9 satellite orbiting Earth at an altitude of 705 km, each revisiting the Equator every 16 d) and in situ data and contains 44 classes, delivered as one GeoTIFF raster file (125.0 MB) at 100 m resolution covering land area over Europe, representative of the time period 2017–2018. The overall accuracy for CLC2018 is 92 % for the blind analysis (i.e. validation team had no knowledge of the CLC2018 thematic classes), but there are regional variations: the Black Sea geographical region has the lowest accuracy of 84 %; country-wise overall accuracy varies from 86 % for Portugal to 99 % for Iceland, lowest accuracy being linked to the landscape complexity (Moiret-Guigand, 2021). More details on the method, data content and access can be found in Büttner and Kosztra (2017), Moiret-Guigand (2021), and the Copernicus website (https://land.copernicus.eu/pan-european/corine-land-cover/clc2018, last access: 18 June 2024).

The CLC2018 was used to generate the irrigated crop fraction and rice fraction fields.

The Spatial Production Allocation Model (SPAM) – Global Spatially-Disaggregated Crop Production Statistics Data for 2010 v2.0 (further referred to as SPAM2010) is a global dataset generated in 2020 which redistributes crop production information from country and sub-national province levels to a finer grid-cell level (IFPRI, 2019). It is derived from numerous data sources, including crop production statistics, cropland data, biophysical crop “suitability” assessments, spatial distribution of specific crops or crop systems, and population density. SPAM2010 contains estimates of crop distributions within disaggregated units (based on a cross-entropy approach) for 42 crops and two production systems (irrigated and rainfed), and it is delivered as 84 files in shapefile format at 10 km (5 arcmin) resolution covering land area from 90° N to 60° S and representative of the year 2010 (in total 2.2 GB). Based on crop expert judgement from international (i.e. International Rice Research Institute, International Maize and Wheat Improvement Center) and national organisations (i.e. the Chinese Academy of Agricultural Sciences), SPAM2010 over Europe and America is more accurate than over Africa and South East Asia, with best performance in allocating rice; grid-by-grid comparison of crop areas with the independent Cropland Data Layer (produced by using satellite images and vast amount of ground truth) over the continental United States shows a coefficient of determination (R2) of 0.7–0.9 and root mean square error (RMSE) of 231–307 ha, indicating a relatively high reliability, with highest R2 and lowest RMSE values for maize and soybean (Yu et al., 2020). More details on the method, data content and access can be found in Yu et al. (2020) and the MapSPAM website (https://mapspam.info, last access: 18 June 2024).

SPAM2010 was used to compute the irrigated crop and rice fractions, crop parameters, and Manning's surface roughness coefficient for irrigated crop fields.

A3 Vegetation properties

The Food and Agriculture Organisation (FAO) of the United Nations Irrigation and Drainage Paper No. 56 (further referred to as FAO56) is a publication covering geographically referenced statistics for crop development stages, crop coefficients, crop height, rooting depth and soil water depletion fraction for common crops found across the world; it also covers procedures for information aggregation, e.g. on the grid. It is delivered as an article with a set of tables and equations and can be considered the most complete source of information on crop properties. More details on the method and data content can be found in Allen et al. (1998) and the FAO online crop information webpage (http://www.fao.org/land-water/databases-and-software/crop-information/tobacco/en/, last access: 18 June 2024).

FAO56 was used to compute the crop coefficients for forest, irrigated crops and other land cover types (online crop information was specifically used for tobacco) and for intermediate computations such as depletion fraction for different crop and surface types (table), crop height, and root depth fields.

Intara et al. (2018) is a publication covering oil palm roots architecture.

Intara et al. (2018) was used for oil palm root depth information in addition to FAO56.

Burek et al. (2014) is a publication covering summarised information for crop coefficients, rooting depth, crop group number and Manning's surface roughness coefficient for different surface types.

Burek et al. (2014) was used for built-up, bare/sparse vegetation, snow and ice, permanent inland water, ocean and seas, herbaceous wetland, moss and lichen surface type crop coefficients, rooting depth, crop group number, and Manning's surface roughness coefficient information in addition to FAO56 and other sources.

The WOFOST 6.0 crop simulation model description (further referred to as SUPIT) is a publication on developing, validating, and testing new or already existing agrometeorological models (Supit et al., 1994). It contains crop group information for several crops as examples and relations for a crop group from water depletion fraction. The publication is delivered as a book with a set of tables and equations. Information on crop group is still considered up to date. More details on the method and data content can be found in Supit et al. (1994).

SUPIT was used to compute the crop group fields for forest, irrigated crops and other land cover types.

The Open-Channel Hydraulics manual (further referred to as CHOW) is a publication on open-channel hydraulics, including basic principles and different types of flows, i.e. uniform, gradually varied, rapidly varied and unsteady (Te Chow, 1959). It contains information on roughness coefficient over different surfaces. The publication is delivered as a book with a set of tables and equations. More details on the method and data content can be found in Te Chow (1959).

CHOW was used to compute the Manning's surface roughness coefficient fields for forest, irrigated crops and other land cover types.

The Copernicus Global Land Service (CGLS) Leaf Area Index (LAI) 1 km Version 2 collection (further referred to as CGLS-LAI) is a set of global maps without missing data describing vegetation dynamics – the annual evolution of LAI at 10 d intervals over the period of 1999–2020. The dataset is derived from SPOT/VEGETATION and PROBA-V data. The dataset's root mean square deviation over 20 ground-based observations for validation sites over the period 2014–2018 is 0.92 compared to 1.19 for MODIS C6 LAI product (Martínez-Sánchez, 2020). The dataset is delivered as one multi-band file per year in NetCDF (netCDF4 CF-1.6) format (14.7 GB yr−1) at 1 km resolution, covering land area from 90° N to 60° S and representative of the 10-year period of 2010–2019. More details on the method, data content and access can be found in Smets (2019), Martínez-Sánchez (2020), and the Copernicus website (https://land.copernicus.eu/global/products/lai, last access: 18 June 2024).

CGLS-LAI was used to compute the LAI fields for forest, irrigated crops and other land cover types.

The RiceAtlas v3 (further referred to as RiceAtlas) is a spatial database of global rice calendars and production. It contains information on start, peak, and end dates of sowing; transporting; and harvesting rice, derived from global and regional databases, national publications, online reports, and expert knowledge. It is delivered as seven files in shapefile format (in total 195.8 MB) for administrative units (in total 2725 spatial units) at 1 km resolution for the national production totals to match the years 2010–2012 (Laborte et al., 2017a). RiceAtlas is  10 times more spatially detailed and has  7 times more special units when compared with other global datasets (Laborte et al., 2017b). More details on the method, data content and access can be found in Laborte et al. (2017a) and Laborte et al. (2017b).

RiceAtlas was used to compute rice planting and rice harvesting days for three different seasons.

A4 Soil properties

The International Soil Reference and Information Centre (ISRIC) SoilGrids250m global gridded soil information release 2017 (further referred to as SoilGrids250m) is an output of special predictions produced by the SoilGrids system (Hengl et al., 2014), as a set of global soil property and class maps at 250 m resolution. It is derived from soil profile data (from  150 000 sites globally) with the use of machine learning and contains information on soil characteristics at six standard depths, including soil textures (clay, silt, sand), depth to bedrock, bulk density, organic carbon, pH and cation exchange capacity. It is delivered as 43 files in GeoTIFF format (in total 111.8 GB) at 250 m resolution, covering land area with no permanent ice and representative for the year 2010 (according to land cover) (Hengl et al., 2017). SoilGrids250m pH comparison with Soil Survey Geographic Database data over California (depth 0–200 cm) and Soil and Landscape Grid of Australia data over Tasmania (depth 0–5 cm) show high correlation, 0.79 and 0.71, respectively (Hengl et al., 2017). Despite its limited accuracy (i.e. between 30 % and 70 %, according to the SoilGrids website) due to the scarcity of soil profile observations (especially in Central Asia, Arctic regions, coastal areas and deserts), the low resolution of covariate data and algorithms, it was selected as the most recent source of information. More details on the method, data content and access can be found in Hengl et al. (2017) and the SoilGrids250m website (https://www.isric.org/explore/soilgrids/faq-soilgrids-2017, last access: 18 June 2024).

SoilGrids250m was used to compute the soil depth and soil hydraulic properties for forest and non-forest.

A5 Lakes

The Global Lakes and Wetlands Database (further referred to as GLWD) is a global database of water bodies. It is derived from a combination of global and regional lake datasets, registers and inventories (i.e. point information with descriptive attributes), and digital maps (i.e. polygons, rasterised global land cover and land use maps). The database consists of two global files in shapefile format at spatial resolutions of up to 1:1 million – GLWD-1 with 3067 largest lake and 654 largest reservoir polygons (6.4 MB) and GLWD-2 with  250 000 smaller lake and reservoir polygons (32.0 MB); there is one global file in the ADF raster format at 30 arcsec resolution – GLWD-3 combines GLWD-1, GLWD-2 and additional information (8.9 MB). Validation against documented data shows that GLWD represents good wetland maximum extent and comprehensively describes lakes with surface area greater or equal to 1 km2 (Lehner and Döll, 2004). More details on the method, data content and access can be found in Lehner and Döll (2004) and the GLWD website (https://www.worldwildlife.org/, last access: 18 June 2024).

GLWD (i.e. only GLWD-1 and GLWD-2) was used to compute the discrete lake mask field.

A6 Water demand

AQUASTAT is the FAO's global information system on water resources and agricultural water management. AQUASTAT collects information on water use via the network of AQUASTAT national correspondents who are required to fill the annual questionnaire and collaborate with the AQUASTAT team in the data validation process. Five types of manual checks are followed by automatic implementation of almost 200 validation rules. The dataset includes data for 180 countries worldwide; yearly data from 1979 to 2019 were used to produce the maps presented by this paper. Float, lumped values for each country for the variables “Gross Domestic Product (GDP)”, “Industry, value added to GDP”, “Agricultural water withdrawal”, “Industrial water withdrawal”, “Municipal water withdrawal”, “Total water withdrawal”, and “Irrigation water withdrawal” were obtained in CSV format (2 files, in total 2.0 MB) from the AQUASTAT data acquisition dashboard (https://tableau.apps.fao.org/views/ReviewDashboard-v1/country_dashboard, last access: 18 June 2024). More details on the method, data content and access can be found on the AQUASTAT website (https://www.fao.org/aquastat/en/overview/methodology/, last access: 18 June 2024).

AQUASTAT variables were used accordingly to compute water demand fields for domestic, industrial, energy, livestock use.

United States Geological Survey National Water Information System (further referred to as USGS NWIS) is a national database on water use data for the United States (US) with annual statistics provided every 5 years since 1950. The water use data are best estimates produced by the USGS in cooperation with local, state and federal agencies as well as academic and private organisations. The water use data are lumped values (float numbers) for each state, delivered in plain text format (52 files, in total 56.0 MB). The following variables were used: “Domestic total self-supplied withdrawals, fresh, in Mgal/d” (1 Mgal/d = 43.81 L s−1), “Public Supply total self-supplied withdrawals, fresh, in Mgal/d”, “Industrial total self-supplied withdrawals, fresh, in Mgal/d”, “Total Thermoelectric Power total self-supplied withdrawals, fresh, in Mgal/d”, “Total Thermoelectric Power power generated, in gigawatt-hours”, and “Livestock total self-supplied withdrawals, fresh, in Mgal/d”. More details on the method, data content and access can be found in USGS NWIS website (https://waterdata.usgs.gov/nv/nwis/wu, last access: 18 June 2024). For this study, data from 1985 to 2015 were used.

USGS NWIS variables were used accordingly to refine the global water demand fields for the domestic, industrial, energy, livestock use sectors for the US.

The Global Change Analysis Model (further referred to as GCAM) is an integrated, multi-sector model developed by the Joint Global Change Research Institute (JGCRI) to explore the overall behaviour of human and physical systems dynamics and interactions. GCAM includes five main systems. One of these systems, the water module, provides information about water withdrawals for energy, agriculture and municipal uses as lumped values of 235 hydrologic basins; a detailed explanation can be found in Calvin et al. (2019). Estimates of industrial, thermoelectric water withdrawals (energy sector) and electricity consumption were computed by running the GCAM model, the outputs used are two files in CSV format (in total 4.0 MB). Data from the following sectors was used: “biomass”, “electricity”, “nuclearFuelGenII”, “nuclearFuelGenIII”, “regional coal”, “regional natural gas”, “regional oil”, “SheepGoat”, “Beef”, “Dairy”, “Pork”, and “Poultry”. More details on the method, data content and access can be found in the documentation of the open source package (https://github.com/JGCRI/gcam-core/tree/gcam-v6.0, last access: 18 June 2024).

GCAM variables were used accordingly to estimate water withdrawals for industrial, energy, livestock use.

Global-scale gridded estimates of thermoelectric power and manufacturing water use (further referred to as Vassolo and Döll, 2005) is a global-scale gridded estimate of water withdrawal for cooling of thermal power stations and for manufacturing. Estimates of values for the year 1995 are provided with a spatial resolution of 0.5° by 0.5°. Thermoelectric power water use is based on the geographical location of 63590 thermal power stations. Manufacturing water use is computed by estimating country-specific water withdrawal values and spatial downscaling using city night-time lights. Dataset verification of Vassolo and Döll (2005) showed satisfactory representation of thermoelectric power water use but high uncertainty in the representation of manufacturing water use. The data are delivered as one shapefile (2.5 MB). More details on the method, data content and validation, and data access can be found in Vassolo and Döll (2005).

The Vassolo and Döll (2005) dataset was used for the computation of water demand fields for energy.

The Gridded Livestock of the World (GLW) version3 (further referred to as GLW3) is a spatial gridded dataset of the global distribution of eight livestock species for 2010. It is delivered as eight GeoTIFF files at 0.083333° ( 10 km at the Equator) resolution (in total 208.0 MB). The species abundance was converted to total livestock mass. More details on the method, data content and access can be found in Gilbert et al. (2018).

GLW3 was used to spatially disaggregate the water demand for livestock use.

World Bank manufacturing value added and gross domestic product (further referred to as World Bank) data provide “Manufacturing, value added (constant value for 2015, USD)” values (further referred to as MVA) and “Gross Domestic Product GDP (constant value for 2015, USD)” values. The data are provided as a table, downloaded in CSV format (six files, in total 6.0 MB) from https://data.worldbank.org, last access: 18 June 2024.

The World Bank dataset was used to temporally downscale the values of water demand fields for the industrial and energy sectors.

The Global Human Settlement Population Grid multitemporal version R2019A (further referred to as GHS-POP) is a spatial raster dataset that depicts the distribution of population, expressed as the number of people per grid cell (Freire et al., 2016; Florczyk et al., 2019; Schiavina et al., 2019). GHS-POP residential population estimates for target years provided by the Center for International Earth Science Information Network GPWv4.10 were disaggregated from census or administrative units to grid cells, informed by the distribution and density of built-up as mapped in the Global Human Settlement Layer. The dataset has a spatial resolution of 9 arcsec ( 300 m at the Equator) and is delivered as individual files in GeoTIFF format for 1975, 1990, 2000 and 2015 (four files, in total 6.5 GB; available online: https://ghsl.jrc.ec.europa.eu/ghs_pop2019.php, last access: 18 June 2024).

GHS-POP was used to spatially disaggregate the country, state, and basin-level information for domestic, industrial, and energy water withdrawals.

Thematic Mapping Country Borders shapefile (further referred to as TM “country borders”) was derived from Thematic Mapping™, which is a tool enabling web browsers to create thematic maps and associated world datasets. For this work, the TM World Borders Dataset was downloaded as one shapefile (10.0 MB). The United States Census Bureau Cartographic Boundary Files – shapefile – (further referred to as US CB) provides the State boundaries for the USA. For this work, the 2018 version was retrieved as one shapefile (3.2 MB; available online: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html, last access: 18 June 2024). More details on the method, data content and access can be found at https://larmarange.github.io/prevR/reference/TMWorldBorders.html, last access: 18 June 2024 (originally was accessed at http://thematicmapping.org/downloads/, last access: 21 January 2024, no longer available online).

TM “country borders” and US CB were used to spatially disaggregate the information of water withdrawal for domestic, industrial and energy uses.

Multi-Source Weather (further referred to as MSWX) is a high-resolution (3-hourly, 0.1°), bias-corrected meteorological product with global coverage from 1979 to 7 months into the future. The data for 42 years ( 316 700 files in NetCDF format, in total 128.0 GB) were retrieved via https://www.gloh2o.org/mswx/, last access: 18 June 2024. For more detailed information, see Beck et al. (2022).

MSWX 2 m daily and monthly maximum and minimum air temperature were used to account for the climate-induced intra- and inter-annual fluctuations of domestic, livestock, and energetic water demands.

Huang et al. (2018) is a publication presenting a 0.5° resolution global monthly gridded sectoral water withdrawal dataset for the period 1971–2010.

Huang et al. (2018) Table 3 (calibrated R coefficient values) and Eqs. (2) to (6) were used for temporal downscaling of domestic and energy water demands in this study, respectively.

Appendix B

Unit conversion to fraction is done in a following way:

Hectare(ha):fraction=ha104/GridCellAream2.Percentage(%):fraction=%/100.Class(landcovertype):fraction=1,i.e.assumesfull100%coverageofthegridcell.
Appendix C

C1 Soil depth

Soil depth layers are derived following Burek et al. (2014) in which the total soil depth is horizontally divided into three layers. The total soil depth is the “absolute_depth_to_bedrock” from SoilGrids250m, whereas root depths of forest and non-forest are derived from FAO56 and CGLS-LC100 datasets at SoilGrids250m native ( 250 m) resolution (see Sect. 6.2 for more details). The methodology implemented for the creation of the three soil layers is the following.

Soil depth layer 1 (surface), SD1, is assumed constant, equal to 50 mm all over the world for consistency with satellite-derived datasets (satellite signal penetration depth of 50 mm is a good approximation to take into account different meteorological conditions at different hours of the day globally based on Lv et al. (2018)), and follows Eq. (C1):

(C1) SD 1 = 50 mm

Soil depth layer 2 (middle), SD2, depends on the absolute depth to bedrock (adb) – if it is equal or less than 300 mm computation follows Eq. (C2); otherwise, it is a function of the root depths as per Eq. (C3) and must meet requirements from Eq. (C4):

(C2)SD2=(adb-SD1)/2,adb300mm;(C3)SD2=min(root_depth,(adb-300mm-SD1)),adb>300m;(C4)SD2=50mm,SD2<50mm.

Soil depth layer 3 (bottom), SD3, is computed following Eq. (C5):

(C5) SD 3 = adb - ( SD 1 + SD 2 ) .

This set of equations is used twice, once with the root depth of forest area and a second time with the root depth of non-forested areas, resulting in a total of six soil depth layers computed at the SoilGrids250m native resolution.

C2 Soil hydraulic parameters

Soil hydraulic parameters are derived by following three main steps (see Fig. C1).

First, soil hydraulic properties are derived at native resolution by applying pedotransfer functions (PTFs) to each SoilGrids250m soil characteristics layer at each available depth. Pedotransfer functions translate field-measured soil information (such as soil texture, pH and structure) into proprieties and parameters needed to describe soil processes. The PTFs implemented here are the ones proposed by Tóth et al. (2015). Users can decide to derive soil proprieties from different PTFs, but the general principle presented here remains valid.

Second, the soil hydraulic parameters calculated at SoilGrids250m depths are vertically downscaled to the model soil depth (previously computed) by weighted average (Fig. C1, step 2 with saturated volumetric soil moisture content as an example) at the native SoilGrids250m resolution ( 250 m).

Third, the soil hydraulic parameters at the final soil depths are upscaled from native to final resolution by average, using forest and non-forest fraction layers as weights (Fig. C1, step 3).

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f22

Figure C1Creation of saturated volumetric soil moisture content parameter “Qs” using the SoilGrids250m dataset “SGRID” and forest fraction.

Download

Appendix D

Here, more regional examples of the most interesting surface fields of CEMS_SurfaceFields_2022 are provided to show what level of detail is available at each resolution and field, as well as to emphasise consistency through all the fields, which is the most valuable requirement when running any type of surface model.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f23

Figure D1Upstream drainage area (in m2), standard deviation of elevation (in m), fraction of forest, fraction of inland water, fraction of irrigated crops, fraction of rice, crop coefficient for forest, crop group number for forest, saturated soil hydraulic conductivity for forested areas of soil depth layer 2 (in mm d−1), saturated volumetric soil moisture (i.e. water) content for forested areas of soil depth layer 2, lake mask, and water demand for domestic use at 1 arcmin ( 1.9 km at the Equator) resolution for the Danube River area in Europe.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f24

Figure D2Same as Fig. D1 but at 3 arcmin ( 5.6 km at the Equator) resolution for the Danube River area in Europe.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f25

Figure D3Same as Fig. D1 but at 1 arcmin ( 1.9 km at the Equator) resolution for the Rhine River area in Germany.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f26

Figure D4Same as Fig. D1 but at 3 arcmin ( 5.6 km at the Equator) resolution for the Rhine River area in Germany.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f27

Figure D5Same as Fig. D1 but at 1 arcmin ( 1.9 km at the Equator) resolution for the Seine river area in France.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f28

Figure D6Same as Fig. D1 but at 3 arcmin ( 5.6 km at the Equator) resolution for the Seine river area in France.

https://hess.copernicus.org/articles/28/2991/2024/hess-28-2991-2024-f29

Figure D7Same as Fig. D1 but at 3 arcmin ( 5.6 km at the Equator) resolution for the Mekong River area in Cambodia.

Data availability

The CEMS_SurfaceFields_2022 datasets are freely available for download from the JRC Data Catalogue and are documented in this paper – global at  5.6 km at the Equator or 3 arcmin resolution (https://data.jrc.ec.europa.eu/dataset/68050d73-9c06-499c-a441-dc5053cb0c86, Choulga et al., 2023a) and over Europe at  1.9 km at the Equator or 1 arcmin resolution (https://data.jrc.ec.europa.eu/dataset/f572c443-7466-4adf-87aa-c0847a169f23, Choulga et al., 2023b).

Author contributions

CP and PS shaped the initial plan of the research; MC and FM executed the initial plan; CM, SG and JD reviewed initial results and provided guidance in further research. MC, FM and CP prepared a first draft of the paper, which was adapted to its present state by contributions from CM, SG, JD, PS and HB.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Acknowledgements

CEMS_SurfaceFields_2022 is a product and service of the Copernicus Emergency Management Service. Margarita Choulga, Francesca Moschini, Cinzia Mazzetti and Christel Prudhomme have been supported by contract 941462-IPR-2021. Authors thank two anonymous reviewers for their valuable comments and suggestions that helped to shape the paper to it's current state.

Financial support

This research has been supported by the Joint Research Centre (grant no. 941462-IPR-2021).

Review statement

This paper was edited by Frederiek Sperna Weiland and reviewed by two anonymous referees.

References

Alfieri, L., Lorini, V., Hirpa, F. A., Harrigan, S., Zsoter, E., Prudhomme, C., and Salamon, P.: A global streamflow reanalysis for 1980–2018, J. Hydrol. X, 6, 100049, https://doi.org/10.1016/j.hydroa.2019.100049, 2020. 

Allen, R. G., Pereira, L. S., Raes, D., and Smith, M.: FAO Irrigation and Drainage Paper No. 56: Crop Evapotranspiration (guidelines for computing crop water requirements), https://www.researchgate.net/publication/284300773_FAO_Irrigation_and_drainage_paper_No_56 (last access: 18 June 2024), 1998. 

Balsamo, G.: Interactive lakes in the Integrated Forecasting System, ECMWF Newsletter No. 137 – Autumn 2013, 30–34, https://doi.org/10.21957/rffv1gir, 2013. 

Bates, P. D. and De Roo, A. P. J.: A simple raster-based model for flood inundation simulation, J. Hydrol., 236, 54–77, https://doi.org/10.1016/S0022-1694(00)00278-X, 2000. 

Beck, H. E., van Dijk, A. I. J. M., Larraondo, P. R., McVicar, T. R., Pan, M., Dutra, E., and Miralles, D. G.: MSWX: Global 3-Hourly 0.1° Bias-Corrected Meteorological Data Including Near-Real-Time Updates and Forecast Ensembles, B. Am. Meteorol. Soc., 103, E710–E732, https://doi.org/10.1175/BAMS-D-21-0145.1, 2022. 

Belleza, G. A. C., Bierkens, M. F. P., and van Vliet M. T. H.: Sectoral water use responses to droughts and heatwaves: analyses from local to global scales for 1990–2019. Environ. Res. Lett., 18, 104008, https://doi.org/10.1088/1748-9326/acf82e, 2023. 

Best, M. J., Pryor, M., Clark, D. B., Rooney, G. G., Essery, R. L. H., Ménard, C. B., Edwards, J. M., Hendry, M. A., Porson, A., Gedney, N., Mercado, L. M., Sitch, S., Blyth, E., Boucher, O., Cox, P. M., Grimmond, C. S. B., and Harding, R. J.: The Joint UK Land Environment Simulator (JULES), model description – Part 1: Energy and water fluxes, Geosci. Model Dev., 4, 677–699, https://doi.org/10.5194/gmd-4-677-2011, 2011. 

Bhardwaj, A.: Evaluation of Openly Accessible MERIT DEM for Vertical Accuracy in Different Topographic Regions of India. 39th INCA International Congress, Dehradun (India), 39, 239–245, https://www.researchgate.net/publication/356726804_Evaluation_ of_openly_Accessible_MERIT_DEM_for_vertical_ accuracy_in_ different_topographic_regions_of_India (last access: 18 June 2024), 2021. 

Bonan, G., Levis, S., Sitch, S., Vertenstein, M., and Oleson, K.: A dynamic global vegetation model for use with climate models: Concepts and description of simulated vegetation dynamics, Glob. Change Biol., 9, 1543–1566, https://doi.org/10.1046/j.1365-2486.2003.00681.x, 2003. 

Brooks, R. H. and Corey, A. T.: Hydraulic properties of porous media. Hydrology paper No. 3, Colorado State Univ., Fort Collins, https://mountainscholar.org/bitstream/handle/10217/61288/HydrologyPapers_n3.pdf (last access: 18 June 2024), 1964. 

Buchhorn, M., Smets, B., Bertels, L., De Roo, B., Lesiv, M., Tsendbazar, N.-E., Herold, M., and Fritz, S.: Copernicus Global Land Service: Land Cover 100 m: collection 3: epoch 2015: Globe (V3.0.1), Zenodo [data set], https://doi.org/10.5281/zenodo.3939038, 2020. 

Buchhorn, M., Smets, B., Bertels, L., De Roo, B., Lesiv, M., Tsendbazar, N.-E., Li, L., and Tarko, A.: Copernicus Global Land Service: Land Cover 100 m: version 3 Globe 2015-2019: Product User Manual (Dataset v3.0, doc issue 3.4), Zenodo [data set], https://doi.org/10.5281/zenodo.4723921, 2021. 

Burek, P., Van Der Knijff, J., and De Roo, A.: LISFLOOD – Distributed Water Balance and Flood Simulation Model – Revised User Manual. Luxembourg: Publications Office of the European Union, JRC78917, https://publications.jrc.ec.europa.eu/repository/handle/JRC78917 (last access: 18 June 2024), 2013. 

Burek, P., Bianchi, A., and Gentile, A.: JRC Technical Report: A Pan-European Data Set for hydrological modelling. Luxembourg: Publications Office of the European Union, https://ec-jrc.github.io/lisflood/pdfs/Dataset_hydro.pdf (last access: 18 June 2024), 2014. 

Büttner, G. and Kosztra, B.: CLC2018 Technical Guidelines, Environment Agency, Austria, https://land.copernicus.eu/en/technical-library/clc-2018-technical-guidelines/@@download/file (last access: 18 June 2024), 2017. 

Calvin, K., Patel, P., Clarke, L., Asrar, G., Bond-Lamberty, B., Cui, R. Y., Di Vittorio, A., Dorheim, K., Edmonds, J., Hartin, C., Hejazi, M., Horowitz, R., Iyer, G., Kyle, P., Kim, S., Link, R., McJeon, H., Smith, S. J., Snyder, A., Waldhoff, S., and Wise, M.: GCAM v5.1: representing the linkages between energy, water, land, climate, and economic systems, Geosci. Model Dev., 12, 677–698, https://doi.org/10.5194/gmd-12-677-2019, 2019. 

Carver, G.: Ten years of OpenIFS at ECMWF, ECMWF Newsletter No. 170 – Winter 2021/22, 6–7, https://www.ecmwf.int/en/newsletter/170/news/ten-years-openifs-ecmwf (last access: 18 June 2024), 2022. 

Chai, L. T., Wong, C. J., James, D., Loh, H. Y., Liew, J. J. F., Wong, W. V. C., and Phua, M. H.: Vertical accuracy comparison of multi-source Digital Elevation Model (DEM) with Airborne Light Detection and Ranging (LiDAR). IOP Conf. Ser.: Earth Environ. Sci., 1053, 012025, https://doi.org/10.1088/1755-1315/1053/1/012025, 2022. 

Choulga, M., Moschini, F., Mazzetti, C., Grimaldi, S., Disperati, J., Beck, H., Salamon, P., and Prudhomme, C.: LISFLOOD static and parameter maps for GloFAS, JRC Data Catalogue [data set], https://data.jrc.ec.europa.eu/dataset/68050d73-9c06-499c-a441-dc5053cb0c86 (last access: 18 June 2024), 2023a. 

Choulga, M., Moschini, F., Mazzetti, C., Grimaldi, S., Disperati, J., Beck, H., Salamon, P., and Prudhomme, C.: LISFLOOD static and parameter maps for Europe, JRC Data Catalogue [data set], https://data.jrc.ec.europa.eu/dataset/f572c443-7466-4adf-87aa-c0847a169f23 (last access: 18 June 2024), 2023b. 

Clark, D. B., Mercado, L. M., Sitch, S., Jones, C. D., Gedney, N., Best, M. J., Pryor, M., Rooney, G. G., Essery, R. L. H., Blyth, E., Boucher, O., Harding, R. J., Huntingford, C., and Cox, P. M.: The Joint UK Land Environment Simulator (JULES), model description – Part 2: Carbon fluxes and vegetation dynamics, Geosci. Model Dev., 4, 701–722, https://doi.org/10.5194/gmd-4-701-2011, 2011. 

Defourny, P., Lamarche, C., Bontemps, S., De Maet, T., Van Bogaert, E., Moreau, I., Brockmann, C., Boettcher, M., Kirches, G., Wevers, J., and Santoro, M.: Land Cover CCI PRODUCT USER GUIDE VERSION 2.0 Tech. Rep. ESA, https://maps.elie.ucl.ac.be/CCI/viewer/download/ESACCI-LC-Ph2-PUGv2_2.0.pdf (last access: 18 June 2024), 2017. 

De Roo, A. P. J., Van Der Knijff, J., Schmuck, G., and Bates, P.: A simple floodplain inundation model to assist in floodplain management. New trends in water and environmental engineering for safety and life: Eco-compatible solutions for aquatic environments, Balkema, Rotterdam, the Netherlands, 1–10, 184 pp., ISBN 9789058091383, 2000. 

De Roo, A., Odijk, M., Schmuck, G., Koster, E., and Lucieer, A.: Assessing the effects of land use changes on floods in the Meuse and Oder catchment, Phys. Chem. Earth. Pt. B, 26, 593–599, 2001. 

Dolan, F., Lamontagne, J., Link, R., Hejazi, M., Reed, P., and Edmonds, J.: Evaluating the economic impact of water scarcity in a changing world, Nat. Commun., 12, 1915, https://doi.org/10.1038/s41467-021-22194-0, 2021. 

Florczyk, A. J., Corbane, C., Ehrlich, D., Freire, S., Kemper, T., Maffenini, L., Melchiorri, M., Pesaresi, M., Politis, P., Schiavina, M., Sabo, F., Zanchetta, L., European Commission, Joint Research Centre: GHSL Data Package 2019 – Public release GHS P2019, EUR 29788 EN, Publications Office of the European Union, Luxembourg, JRC 117104, ISBN 978-92-76-13186-1, https://doi.org/10.2760/290498, 2019. 

Fredlund, D. G. and Xing, A.: Equations for the Soil-Water Characteristic Curve, Can. Geotech. J., 31, 521–532, https://doi.org/10.1139/t94-061, 1994. 

Freire, S., MacManus, K., Pesaresi, M., Doxsey-Whitfield, E., and Mills, J.: Development of new open and free multi-temporal global population grids at 250 m resolution. Geospatial Data in a Changing World, Association of Geographic Information Laboratories in Europe (AGILE), https://www.researchgate.net/profile/Martino-Pesaresi-2/publication/304625387 (last access: 18 June 2024), 2016. 

Gardner, W. R.: Calculation of capillary conductivity from pressure plate outflow data, Soil Science Society Proceeding, 20, 317–320, https://doi.org/10.2136/sssaj1956.03615995002000030006x, 1956. 

Google Earth Engine: Frequently Asked Questions (GEE: FAQ), https://earthengine.google.com/faq/ (last access: 18 June 2024), 2023. 

Gilbert, M., Nicolas, G., Cinardi, G., Van Boeckel, T. P., Vanwambeke, S. O., Wint, G. R. W., and Robinson, T. P.: Global distribution data for cattle, buffaloes, horses, sheep, goats, pigs, chickens and ducks in 2010, Sci. Data, 5, 180227, https://doi.org/10.1038/sdata.2018.227, 2018. 

Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., and Moore, R.: Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ., 202, 18–27, https://doi.org/10.1016/j.rse.2017.06.031, 2017. 

Grimaldi, S., Salamon, P., Disperati, J., Zsoter, E., Russo, C., Ramos, A., Carton De Wiart, C., Barnard, C., Hansford, E., Gomes, G., and Prudhomme, C.: River discharge and related historical data from the Global Flood Awareness System. v4.0. Copernicus Climate Change Service (C3S) Climate Data Store (CDS) [data set], https://doi.org/10.24381/cds.a4fdd6b9, 2022. 

Gupta, S., Lehmann, P., Bonetti, S., Papritz, A., and Or, D.: Global prediction of soil saturated hydraulic conductivity using random forest in a Covariate-based GeoTransfer Function (CoGTF) framework, J. Adv. Model. Earth Sy., 13, e2020MS002242, https://doi.org/10.1029/2020MS002242, 2021. 

Harrigan, S., Zsoter, E., Cloke, H., Salamon, P., and Prudhomme, C.: Daily ensemble river discharge reforecasts and real-time forecasts from the operational Global Flood Awareness System, Hydrol. Earth Syst. Sci., 27, 1–19, https://doi.org/10.5194/hess-27-1-2023, 2023. 

Hejazi, M., Edmonds, J., Clarke, L., Kyle, P., Davies, E., Chaturvedi, V., Wise, M., Patel, P., Eom, J., Calvin, K., Moss, R., and Kim, S.: Long-term global water projections using six socioeconomic scenarios in an integrated assessment modeling framework, Technol. Forecast. Soc., 81, 205–226, https://doi.org/10.1016/j.techfore.2013.05.006, 2014. 

Hengl, T., de Jesus, J. M., MacMillan, R. A., Batjes, N. H., Heuvelink, G. B. M., Ribeiro, E., Samuel-Rosa, A., Kempen, B., Leenaars, J. G. B., Walsh, M. G., and Ruiperez Gonzalez, M.: SoilGrids1km – Global Soil Information Based on Automated Mapping, PLOS ONE, 9, e105992, https://doi.org/10.1371/journal.pone.0105992, 2014. 

Hengl, T., de Jesus, J. M., Heuvelink, G. B. M., Ruiperez Gonzalez, M., Kilibarda, M., Blagotić, A., Shangguan, W., Wright, M. N., Geng, X., Bauer-Marschallinger, B., Guevara, M. A., Vargas, R., MacMillan, R. A., Batjes, N. H., Leenaars, J. G. B., Ribeiro, E., Wheeler, I., Mantel, S., and Kempen, B.: SoilGrids250m: Global gridded soil information based on machine learning. PLOS ONE 12, 1–40, https://doi.org/10.1371/journal.pone.0169748, 2017. 

Hirpa, F. A., Salamon, P., Beck, H. E., Lorini, V., Alfieri, L., Zsoter, E., and Dadson, S. J.: Calibration of the Global Flood Awareness System (GloFAS) using daily streamflow data, J. Hydrol., 566, 595–606, https://doi.org/10.1016/j.jhydrol.2018.09.052, 2018. 

Hodnett, M. G. and Tomasella, J.: Marked differences between van Genuchten soil water-retention parameters for temperate and tropical soils: a new water-retention pedo-transfer functions developed for tropical soils, Geoderma, 108, 155–180, https://doi.org/10.1016/S0016-7061(02)00105-2, 2002. 

Huang, Z., Hejazi, M., Li, X., Tang, Q., Vernon, C., Leng, G., Liu, Y., Döll, P., Eisner, S., Gerten, D., Hanasaki, N., and Wada, Y.: Reconstruction of global gridded monthly sectoral water withdrawals for 1971–2010 and analysis of their spatiotemporal patterns, Hydrol. Earth Syst. Sci., 22, 2117–2133, https://doi.org/10.5194/hess-22-2117-2018, 2018. 

Huijnen, V., Le Sager, P., Köhler, M. O., Carver, G., Rémy, S., Flemming, J., Chabrillat, S., Errera, Q., and van Noije, T.: OpenIFS/AC: atmospheric chemistry and aerosol in OpenIFS 43r3, Geosci. Model Dev., 15, 6221–6241, https://doi.org/10.5194/gmd-15-6221-2022, 2022. 

Intara, Y. I., Nusantara, A. D., Supanjani, Caniago, Z., and Ekawita, R.: Oil Palm Roots Architecture in Response to Soil Humidity, International Journal of Oil Palm, 1, 2, https://ijop.id/index.php/ijop/article/view/11/10 (last access: 18 June 2024), 2018. 

International Food Policy Research Institute (IFPRI): Global Spatially-Disaggregated Crop Production Statistics Data for 2010 Version 2.0, Harvard Dataverse [data set], https://doi.org/10.7910/DVN/PRFF8V, 2019. 

Karabulut, A., Egoh, B. N., Lanzanova, D., Grizzetti, B., Bidoglio, G., Pagliero, L., Bouraoui, F., Aloe, A., Reynaud, A., Maes, J., Vandecasteele, I., and Mubareka, S.: Mapping water provisioning services to support the ecosystem–water–food–energy nexus in the Danube river basin, Ecosyst. Serv., 17, 278–292, https://doi.org/10.1016/j.ecoser.2015.08.002, 2016. 

Karssenberg, D., Schmitz, O., Salamon, P., de Jong, K., and Bierkens, M. F. P.: A software framework for construction of process-based stochastic spatio-temporal models and data assimilation, Environ. Modell. Softw., 25, 489–502, https://doi.org/10.1016/j.envsoft.2009.10.004, 2010. 

Kimpson, T., Choulga, M., Chantry, M., Balsamo, G., Boussetta, S., Dueben, P., and Palmer, T.: Deep learning for quality control of surface physiographic fields using satellite Earth observations, Hydrol. Earth Syst. Sci., 27, 4661–4685, https://doi.org/10.5194/hess-27-4661-2023, 2023. 

Köhler, M. O., Hill, A. A., Huijnen, V., and Le Sager, P.: Enhancing OpenIFS by adding atmospheric composition capabilities, ECMWF Newsletter No. 175 – Spring 2023, 27–31, https://doi.org/10.21957/np36mk1s9d, 2023. 

Kohli, A., Frenken, K., and Spottorno, C.: Disambiguation of water statistics. FAO AQUASTAT Report, AQUASTAT Programme, FAO, 1–6, https://www.fao.org/3/bc816e/bc816e.pdf (last access: 18 June 2024), 2012. 

Laborte, A. G., Gutierrez, M. A., Balanza, J. G., Saito, K., Zwart, S. J., Boschetti, M., Murty, M. V. R., Villano, L., Aunario, J. K., Reinke, R., Koo, J., Hijmans, R. J., and Nelson, A.: RiceAtlas, a spatial database of global rice calendars and production, V3, Harvard Dataverse [data set], https://doi.org/10.7910/DVN/JE6R2R, 2017a. 

Laborte, A. G., Gutierrez, M. A., Balanza, J. G., Saito, K., Zwart, S. J., Boschetti, M., Murty, M. V. R., Villano, L., Aunario, J. K., Reinke, R., Koo, J., Hijmans, R. J., and Nelson, A.: RiceAtlas, a spatial database of global rice calendars and production. Sci. Data, 4, 170074, https://doi.org/10.1038/sdata.2017.74, 2017b. 

Lawrence, D. M., Fisher, R. A., Koven, C. D., Oleson, K. W., Swenson, S. C., Bonan, G., Collier, N., Ghimire, B., van Kampenhout, L., Kennedy, D., Kluzek, E., Lawrence, P. J., Li, F., Li, H., Lombardozzi, D., Riley, W. J., Sacks, W. J., Shi, M., Vertenstein, M., Wieder, W. R., Xu, C., Ali, A. A., Badger, A. M., Bisht, G., van den Broeke, M., Brunke, M. A., Burns, S. P., Buzan, J., Clark, M., Craig, A., Dahlin, K., Drewniak, B., Fisher, J. B., Flanner, M., Fox, A. M., Gentine, P., Hoffman, F., Keppel-Aleks, G., Knox, R., Kumar, S., Lenaerts, J., Leung, L. R., Lipscomb, W. H., Lu, Y., Pandey, A., Pelletier, J. D., Perket, J., Randerson, J. T., Ricciuto, D. M., Sanderson, B. M., Slater, A., Subin, Z. M., Tang, J., Thomas, R. Q., Val Martin, M., and Zeng, X.: The Community Land Model version 5: Description of new features, benchmarking, and impact of forcing uncertainty, J. Adv. Model. Earth Sy., 11, 4245–4287, https://doi.org/10.1029/2018MS001583, 2019. 

Lehner, B. and Döll, P.: Development and validation of a global database of lakes, reservoirs and wetlands, J. Hydrol., 296, 1–22, https://doi.org/10.1016/j.jhydrol.2004.03.028, 2004. 

Lv, S., Zeng, Y., Wen, J., Zhao, H., and Su, Z.: Estimation of Penetration Depth from Soil Effective Temperature in Microwave Radiometry, Remote Sens., 10, 519, doi.:10.3390/rs10040519, 2018. 

Marthews, T. R., Dadson, S. J., Clark, D. B., Blyth, E. M., Hayman, G. D., Yamazaki, D., Becher, O. R. E., Martínez-de la Torre, A., Prigent, C., and Jiménez, C.: Inundation prediction in tropical wetlands from JULES-CaMa-Flood global land surface simulations, Hydrol. Earth Syst. Sci., 26, 3151–3175, https://doi.org/10.5194/hess-26-3151-2022, 2022. 

Martínez-Sánchez, E.: Scientific Quality Evaluation of LAI/FAPAR/FCOVER Collection 1km Version 1 and Version 2 Issue I1.00, https://land.copernicus.eu/global/sites/cgls.vito.be/files/products/CGLOPS1_SQE2019_LAI1km-V1&V2_I1.00.pdf (last access: 18 June 2024), 2020. 

McManamay, R. A., KC, B., Allen-Dumas, M. R., Kao, S. C., Brelsford, C. M., Ruddell, B. L., Sanyal, J., Stewart, R. N., and Bhaduri, B. L.: Reanalysis of water withdrawal for irrigation, electric power, and public supply sectors in the conterminous United States, 1950–2016, Water Resour, Res., 57, e2020WR027751, https://doi.org/10.1029/2020WR027751, 2021. 

Moiret-Guigand, A.: Copernicus Land monitoring services – CLC2018/CLCC1218 VALIDATION REPORT, Issue 1.3, https://land.copernicus.eu/en/technical-library/clc-2018-and-clc-change-2012-2018-validation-report/@@download/file (last access: 18 June 2024), 2021. 

O'Callaghan, J. F. and Mark, D. M.: The Extraction of Drainage Networks from Digital Elevation Data, Comput. Vision Graph., 28, 323–344, https://doi.org/10.1016/S0734-189X(84)80011-0, 1984. 

Pesaresi, M. and Politis, P.: GHS-BUILT-S R2022A – GHS built-up surface grid, derived from Sentinel2 composite and Landsat, multitemporal (1975–2030), European Commission, Joint Research Centre (JRC) [data set], https://doi.org/10.2905/D07D81B4-7680-4D28-B896-583745C27085, 2022. 

Rosbjerg, D. and Madsen, H.: Concepts of hydrologic modeling. In Encyclopedia of Hydrological Sciences, edited by: Anderson, M. G. and McDonnell, J. J., 10, 1–9, https://doi.org/10.1002/0470848944.hsa009, 2006. 

Schiavina, M., Freire, S., and MacManus, K.: GHS-POP R2019A – GHS population grid multitemporal (1975, 1990, 2000, 2015). European Commission, Joint Research Centre (JRC) [data set], https://doi.org/10.2905/0C6B9751-A71F-4062-830B-43C9F432370F, 2019. 

Schiavina, M., Melchiorri, M., Pesaresi, M., Politis, P., Freire, S., Maffenini, L., Florio, P., Ehrlich, D., Goch, K., Tommasi, P., Kemper, T., European Commission, Joint Research Centre: GHSL Data Package 2022 – Public release GHS P2022, Publication Office of the European Union, JRC 129516, ISBN 978-92-76-53071-8, https://doi.org/10.2760/19817, 2022. 

Smets, B.: Product User Manual of LAI/FAPAR/FCOVER Collection 1km Version 2 Issue 1.33, https://land.copernicus.eu/global/sites/cgls.vito.be/files/products/CGLOPS1_PUM_LAI1km-V2_I1.33.pdf (last access: 18 June 2024), 2019. 

Smith, P. J., Pappenberger, F., Wetterhall, F., Del Pozo, J. T., Krzeminski, B., Salamon, P., Muraro, D., Kalas, M., and Baugh, C.: On the operational implementation of the European Flood Awareness System (EFAS). Flood forecasting: A Global Perspective, Academic Press, 313–348, https://doi.org/10.1016/B978-0-12-801884-2.00011-6, 2016. 

Sparrow, S., Bowery, A., Carver, G. D., Köhler, M. O., Ollinaho, P., Pappenberger, F., Wallom, D., and Weisheimer, A.: OpenIFS@home version 1: a citizen science project for ensemble weather and climate forecasting, Geosci. Model Dev., 14, 3473–3486, https://doi.org/10.5194/gmd-14-3473-2021, 2021. 

Supit, I., Hoojer, A. A., and Van Diepen, C. A.: System description of the Wofost 6.0 crop simulation model implemented in CGMS. Volume 1: Theory and Algorithms, https://www.researchgate.net/publication/282287246_System_ description_of_the_Wofost_60_crop_simulation_model_implemented_in_CGMS_Volume_1_Theory_and_Algorithms, last access: 18 June 2024). 1994. 

Te Chow, V.: Open-channel Hydraulics. Civil engineering series, McGraw-Hill Book Company Inc., ISBN 07-010776-9, https://heidarpour.iut.ac.ir/sites/heidarpour.iut.ac.ir/files/u32/open-chow.pdf (last access: 18 June 2024), 1959. 

Tóth, B., Weynants, M., Nemes, A., Makó, A., Bilas, G., and Tóth, G.: New generation of hydraulic pedotransfer functions for Europe, Eur. J. Soil Sci., 66, 226–238, https://doi.org/10.1111/ejss.12192, 2015. 

Van Der Knijff, J. and De Roo, A.: LISFLOOD – Distributed Water Balance and Flood Simulation Model, Revised User Manual, EUR 22166 EN/2, Office for Official Publications of the European Communities, Luxembourg, 109 pp., https://doi.org/10.2788/24719, 2008. 

Van Der Knijff, J. M., Younis, J., and De Roo, A. P. J.: LISFLOOD: A GIS-based distributed model for river basin scale water balance and flood simulation, Int. J. Geogr. Inf. Sci., 24, 189–212, 2010. 

van Genuchten, M. T.: A closed-form equation for predicting the hydraulic conductivity of unsaturated soils, Soil Sci. Soc. Am. J., 44, 892–898, 1980. 

Vanham, D., Alfieri, L., Flörke, M., Grimaldi, S., Lorini, V., De Roo, A., and Feyen, L.: The number of people exposed to water stress in relation to how much water is reserved for the environment: a global modelling study, The Lancet Planetary Health, 5, e766–e774, https://doi.org/10.1016/S2542-5196(21)00234-5, 2021. 

Vassolo, S. and Döll, P.: Global-scale gridded estimates of thermoelectric power and manufacturing water use. Water Resour. Res. 41, https://doi.org/10.1029/2004WR003360, 2005. 

Voisin, N., Liu, L., Hejazi, M., Tesfa, T., Li, H., Huang, M., Liu, Y., and Leung, L. R.: One-way coupling of an integrated assessment model and a water resources model: evaluation and implications of future changes over the US Midwest, Hydrol. Earth Syst. Sci., 17, 4555–4575, https://doi.org/10.5194/hess-17-4555-2013, 2013. 

Wada, Y., van Beek, L. P. H., Viviroli, D., Dürr, H. H., Weingartner, R., and Bierkens, M. F. P.: Global monthly water stress: 2. Water demand and severity of water stress, Water Resour. Res., 47, W07518, https://doi.org/10.1029/2010WR009792, 2011. 

Yamazaki, D., Oki, T., and Kanae, S.: Deriving a global river network map and its sub-grid topographic characteristics from a fine-resolution flow direction map, Hydrol. Earth Syst. Sci., 13, 2241–2251, https://doi.org/10.5194/hess-13-2241-2009, 2009. 

Yamazaki, D., Kanae, S., Kim, H., and Oki, T.: A physically based description of floodplain inundation dynamics in a global river routing model, Water Resour. Res., 47, W04501, https://doi.org/10.1029/2010WR009726, 2011. 

Yamazaki, D., Ikeshima, D., Tawatari, R., Yamaguchi, T., O'Loughlin, F., Neal, J. C., Sampson, C. C., Kanae, S., and Bates, P. D.: A high-accuracy map of global terrain elevations, Geophys. Res. Lett., 44, 5844–5853, https://doi.org/10.1002/2017GL072874, 2017. 

Yamazaki, D., Ikeshima, D., Sosa, J., Bates, P. D., Allen, G. H., and Pavelsky, T. M.: MERIT Hydro: A high-resolution global hydrography map based on latest topography datasets, Water Resour. Res., 55, 5053–5073, https://doi.org/10.1029/2019WR024873, 2019. 

Yu, Q., You, L., Wood-Sichra, U., Ru, Y., Joglekar, A. K. B., Fritz, S., Xiong, W., Lu, M., Wu, W., and Yang, P.: A cultivated planet in 2010 – Part 2: The global gridded agricultural-production maps, Earth Syst. Sci. Data, 12, 3545–3572, https://doi.org/10.5194/essd-12-3545-2020, 2020. 

Zhang, Y. and Schaap, M. G.: Estimation of saturated hydraulic conductivity with pedotransfer functions: A review, J. Hydrol., 575, 1011–1030, https://doi.org/10.1016/j.jhydrol.2019.05.058, 2019. 

Zhang, J., Liu, D., Guo, S., Xiong, L., Liu, P., Chen, J., and Yin, J.: High resolution annual irrigation water use maps in China based-on input variables selection and convolutional neural networks, Journal of Cleaner Production, 405, 136974, https://doi.org/10.1016/j.jclepro.2023.136974, 2023. 

1

JULES is a land surface model whose development is coordinated by the UK Met Office and the UK Centre for Ecology & Hydrology.

2

OpenIFS is a numerical weather forecast model available to external users for research and training.

3

CLM is an Earth system model with a strong climate component, maintained by the National Centre for Atmospheric Research but available for use by the wider research community.

4

LISFLOOD-OS is a spatially distributed water resources model developed by the Joint Research Centre and available for use and development through a code repository (available online: https://ec-jrc.github.io/lisflood/#lisflood, last access: 18 June 2024, https://ec-jrc.github.io/lisflood-code/, last access: 18 June 2024).

5

European Flood Awareness System (EFAS) version 5 (Smith et al., 2016; information available online: https://www.efas.eu/, last access: 18 June 2024).

6

Global Flood Awareness System (GloFAS) version 4 (Hirpa et al., 2018; Alfieri et al., 2020; Harrigan et al., 2023; Grimaldi et al., 2022; information available online: https://www.globalfloods.eu/, last access: 18 June 2024).

Download
Short summary
CEMS_SurfaceFields_2022 dataset is a new set of high-resolution maps for land type (e.g. lake, forest), soil properties and population water needs at approximately 2 and 6 km at the Equator, covering Europe and the globe (excluding Antarctica). We describe what and how new high-resolution information can be used to create the dataset. The paper suggests that the dataset can be used as input for river, weather or other models, as well as for statistical descriptions of the region of interest.