Following the rise of R as a scientific programming language, the increasing requirement for more transferable research and the growth of data availability in hydrology, R packages containing hydrological models are becoming more and more available as an open-source resource to hydrologists. Corresponding to the core of the hydrological studies workflow, their value is increasingly meaningful regarding the reliability of methods and results. Despite package and model distinctiveness, no study has ever provided a comparison of R packages for conceptual rainfall–runoff modelling from a user perspective by contrasting their philosophy, model characteristics and ease of use. We have selected eight packages based on our ability to consistently run their models on simple hydrology modelling examples. We have uniformly analysed the exact structure of seven of the hydrological models integrated into these R packages in terms of conceptual storages and fluxes, spatial discretisation, data requirements and output provided. The analysis showed that very different modelling choices are associated with these packages, which emphasises various hydrological concepts. These specificities are not always sufficiently well explained by the package documentation. Therefore a synthesis of the package functionalities was performed from a user perspective. This synthesis helps to inform the selection of which packages could/should be used depending on the problem at hand. In this regard, the technical features, documentation, R implementations and computational times were investigated. Moreover, by providing a framework for package comparison, this study is a step forward towards supporting more transferable and reusable methods and results for hydrological modelling in R.

Since the early 1960s, many hydrologists have been designing models to better understand water cycle processes controlling river flows

Various types of hydrological models exist, which differ according to their assumptions on the representation of natural processes and space and time dependencies

A large number of models can be found on the R platform. The R language

At a time when data management is a key issue in many branches of science, R has taken a central place in hydrology

The review paper published in the Hydrology and Earth System Sciences journal by

There is a wide variety of models contained in the R packages that we have selected for this study. To lay the foundations of our analysis, we first present in this section how we have selected the packages, then we introduce the framework for analysing the models and the packages from a user perspective. Here we make a distinction between R packages and the hydrological models that are implemented within the packages. In this framework for analysis, we separate model conceptualisation (Sect.

Deciding upon the number of packages implies finding the right balance between including many packages and conducting a thorough assessment. On the one hand, our aim was to select as many packages as possible in order to present an extensive comparison. On the other hand, to allow a comparison, only models with similar set-ups could be used; thus, we had to narrow our list to do so. In this regard, we have selected the packages containing conceptual (bucket-type) continuous rainfall–runoff models as they were the most frequently encountered during our search and are widely used for many applications in hydrology

We based our search on the following four sources: the CRAN, GitHub (

Investigating the different hydrological characteristics behind the models contained in the R packages is a difficult but useful exercise. It aims at gathering information about the various hydrological visions available in a comparative framework. It is relevant to proceed with this comparison task to help any student or more experienced hydrologist to understand what is involved when using a specific model implemented in one of the R packages. The selected packages contain various hydrological models based on different assumptions. These assumptions can be sorted out into simplification options regarding storages, fluxes, time and space. In this comparative study, we first propose a unified comparison of the conceptual representation of storages and fluxes by the models included in the selected packages, then the spatial discretisation they imply and, finally, a description of model requirements and retrievable outputs via the packages. The unified representations should allow more consistent comparisons and, therefore, help the modellers in their choice of methodology for a specific case study. Package documentation and source codes were thoroughly screened to conduct these analyses. This work was carried out in accordance with the comments and recommendations of most of the package authors.

Each model has its own degree of complexity regarding the representation of storages and fluxes. The differences in model structure partly depend on the perceptual model of how a catchment is functioning

We selected an approach for this analysis that derives from the work of

Users must be aware of the spatial discretisation that is available. Furthermore, some packages offer the possibility to apply different types of catchment discretisations for the same model. We, therefore, present the different cases for the selected packages after introducing a special case that is snow modelling spatial distribution.

Since hydrological models do not always rely on the same assumptions, their requirements, i.e. data inputs and number of adjustable parameters, can differ. As data availability can sometimes be a restraining factor, it is essential for users to be informed about the model data requirements. The packages also allow the operation of the models at different time steps and imply different types of numerical resolutions of model equations. The different equations of a hydrological model can be solved using different techniques. The equations are solved analytically (the exact solution is determined by integrating the equation for a given time step), explicitly (the solution is approximated by its derivative at the beginning of the time step) or implicitly (the solution is approximated by its derivative at the end of the time step). When the solution is analytical or explicit, the operator splitting technique (OS) is commonly applied to solve the model equations. When OS is applied, the different processes, such as evaporation, runoff and percolation are calculated sequentially

By making different outputs available, R packages allow modellers to better assess the suitability of applying a model for a specific problem. It can also facilitate the evaluation of appropriate parameter estimation, i.e finding a consistent set of parameters. Among the practical outputs for a modeller, time series of actual evapotranspiration estimates can be useful for understanding the behaviour of the soil moisture accounting functions. Retrieving time series of runoff components (e.g. fast runoff and very quick runoff), which are highlighted by Sect.

Any modeller would need to understand these specificities in order to select and apply a model. We summarise these characteristics (requirements, time step and numerical resolution and outputs) in Sect.

The different packages implement a set of functionalities to operate the models, which can be more or less in line with the hydrological workflow, i.e. from data preparation to analysis of the results. These functionalities aim at easing and sometimes constraining the use of the model. One would expect to use all the functionalities required to consistently apply a specific model and avoid any supplementary source of errors. One of the specificities of R packages is the provided documentation. The related description and examples must be complete to ensure the appropriate application of the models. The user is guided by basic examples and is made aware of potential errors that can occur. Following the analysis of functionalities and documentation, we present an analysis of R implementation that should foster more rigorous applications of the models. In an effort to contribute to more extensive documentation relating to the packages and their models, we provide R scripts enabling the use of each package on simple examples. A short analysis of central processing unit (CPU) times is derived from the application of these scripts.

What a package provides in terms of functionalities is a distinguishing feature when selecting a specific software or another programming language for hydrological modelling. Among the main features, we usually find the careful preparation of input data to respect the right time references, initialisation period or specific R objects. Enabling an automatic calibration procedure to find a set of parameters consistent with the catchment of study can be an important step for some models as well

We present in this analysis whether the selected packages integrate these basic functionalities to consistently apply the models. Inspections of the packages were conducted based on the different types of documents related to the packages and models. When judged necessary, the codes were analysed to ensure accurate results.

To handle the complexity associated with the different hydrological models and with the functionalities provided by the packages, the documentation is obviously essential for any user. It is, therefore, important to assess whether looking at the overall documentation is sufficient to easily make use of the package basics. In this regard, we compared the available explanatory documents. This analysis is, by definition, subjective as it relies on our experience as users. However, we think that it can still give insights into the meaningful content of the documentation. Analysing the documentation explanation by explanation would indeed be very complicated to present. There are the following two different types of documentation related to these packages: the R documentation that includes user manuals (functions explanations and mandatory for packages accepted by CRAN) and sometimes vignettes

Package practicalities can also be assessed through an analysis of the links between the main functions of a package. Such an examination could be useful to provide guidance regarding package application. We try to put ourselves in the shoes of users who have to apply the models of the different packages and, therefore, need to understand which function they have to use, where to use it in the script and how to use it. In this regard, we propose a unified diagram of the connections between the main functions that we have been able to run (see Fig.

Package developers made several choices in terms of R implementation that can affect package usability. For that reason, we analyse the programming languages and external dependencies. We also perform a short analysis of package CPU times.

Some packages are entirely coded in R, which is an interpreted language, and some integrate models coded with a compiled programming language interfaced with R. The different programming languages interfaced with R were identified by extracting the package sources because they could not necessarily be identified by simply displaying the code from the R console. We considered a package as dependent on external dependencies if one of its functions cannot be run without downloading another package. A package is not considered as being dependent on any other package when the use of an external package is only suggested in an example or in one of the related articles. Base packages, such as

From a user perspective, computation times can be meaningful to determine whether a package is suitable for a specific study. Short computation times are usually very well appreciated, especially when dealing with finer time steps or more complex spatial discretisations. Applying a model to a large database, generating an ensemble in operational (flood) forecasting or performing Monte Carlo runs for uncertainty analyses can also significantly increase computation times; hence, some of the packages include some compiled code to speed up the production runs of the model. We analysed the CPU time required for one model run, which was estimated from 1000 runs with the

The outcome of our selection is a list of eight packages that will be carefully compared throughout the paper. Here we give a first overview of these packages along with their related bucket-type hydrological models. The full list is presented in Table

A list of the selected packages with their related models. The models included in the following analyses are in bold. The models included in

A list of the snow models contained in the selected packages.

We chose to exclude the following packages, and we justify our choice in the following:

The

The

The

The

The

The

The topography-based hydrological model

Driven by a desire to relax some of the assumptions of TOPMODEL, the authors proposed a new version, i.e. the dynamic TOPMODEL

The HBV model

The hydrological model assessment and development package

The

A modified version of the HBV rainfall–runoff model

The

The diagrams of Fig.

Unified diagrams illustrating the depiction of conceptual storages and fluxes by the main models contained in the selected packages.

The combined GR4J and CemaNeige snow models are both included in

Precipitation is divided into solid or liquid water for the calculation of snow accumulation and melt. Liquid precipitation and melt resulting from the snow function can either directly join the surface water reservoir (red rectangle) or enter the wetness index calculation. The wetness index determines the fraction of water infiltrating in the soil reservoir, which contains both the vadose zone and saturated zone (green/brown reservoir) or joining the linear quick-flow reservoir (yellow to orange gradient rectangle) that supplies the surface water reservoir. Evapotranspiration is retrieved from the surface water reservoir and from the vadose zone both as a function of PET and water contents. WALRUS integrates an explicit representation of the dynamic water table in shallow groundwater of lowland areas. The vadose zone concurrently interacts with the groundwater through the dynamic water table in the same reservoir. The overall saturation of the soil reservoir is governed by the dryness of the vadose zone, which determines the wetness index. The groundwater table depth is compared to the surface water level to determine either drainage towards the surface water or infiltration from the surface water. Discharge is a function of the surface water level. Losses and gains can occur from/to the groundwater reservoir by seepage and from/to the surface water by extraction or surface water supply.

As briefly introduced in Sect.

In the TOPMODEL 1995 version of

The dynamic version of TOPMODEL

In the Sacramento model of

In terms of conceptual storages and fluxes, the HBV model of

A simple degree day factor snow model (light blue) feeds, with melt or liquid precipitation, into a catchment moisture deficit model that represents soil moisture accounting (green). Evapotranspiration occurs from this store. The resulting effective rainfall is passed to a unit hydrograph, typically consisting of two flow paths (very quick/fast and slower groundwater) but with the potential for other configurations. These two runoff components are then added together to form the final discharge value.

This unified representation of the model structures in terms of conceptual storages and fluxes reveals certain trends in the different modelling choices. Although it is clear that each structure has its own specificities, the schematics highlight several modelling similarities. When snow is taken into account (WALRUS, GR4J-CemaNeige, Sacramento, TUWmodel and IHACRES-CMD), the related calculations respect similar steps where total rainfall (solid

As presented in the previous section, some packages enable the application of a snow function along with the hydrological models they include (

All these packages allow one to proceed with snow calculations considering the catchment as a single unit. In that case, input data are aggregated at the catchment scale.

Example of GR4J-CemaNeige elevation zones

In the case of our selected models, the packages theoretically allow one or more of the spatial discretisation configurations illustrated in Fig.

Illustration of the three possible spatial discretisations concerning the models contained in the selected packages for this study. From left to right are the lumped configuration, hydrological response units (HRUs) configuration and sub-catchments configuration. The catchment outline is from the Meuse river in Saint-Mihiel (France). The HRUs were generated using a function of the

Possible spatial configurations for each model. If a package allows a specific configuration (✓), then it means that the model is coded for this configuration in the related package, but it does not mean that the necessary preprocessing functions are provided. The tilde (

When the models are applied with a lumped spatial configuration, inputs of precipitation and potential evapotranspiration are aggregated on the whole catchment. There is one set of parameters, which means that the model reservoirs represent the water content at the catchment scale. The model simulates a discharge output at the catchment outlet where the hydrometric record station is located.

TOPMODEL 1995 does not rely on the same calculations as dynamic TOPMODEL, especially regarding the computational units. In this implementation of TOPMODEL 1995, inputs of precipitation and potential evapotranspiration are aggregated over the entire catchment (as a lumped model) although, in the original paper

The HBV model of

A large proportion of the packages that we have selected contain models that can be run as lumped models, though some of them can rely on a more complex spatial distribution with very specific characteristics. The most complex level of the spatial distribution is enabled by the

Table

The

The two versions of TOPMODEL require an analysis of digital terrain data and, hence, more preprocessing work. TUWmodel and Sacramento have the highest number of parameters to adjust; however, five parameters out of 15 for

The differences in terms of the time step and the resolution of model equations are summarised in Table

Table

Requirements to run the models and associated numerical resolutions. Note: D – daily; H – hourly; M – monthly; A – annual; FL – flexible; Num. res. – numerical resolution; OS – operator splitting; Ana – analytic; Exp – explicit; Imp – implicit;

Model outputs made available by the packages. Note: TS – time series; AET – actual evapotranspiration; RC – runoff components. The tilde (

The variety of models presented in this comparison are based on similar but specific assumptions in terms of storages, fluxes and spatial discretisation. The models are formulated based on our knowledge of these properties and their spatial implications. For instance, predictions of TOPMODEL and dynamic TOPMODEL can be mapped back into space because of the direct routing on the hillslopes – either implicit for TOPMODEL or explicit for dynamic TOPMODEL. It is a significant difference in terms of representing the processes in relation to catchment characteristics with other models relying on independent HRUs. These assumptions are not valid for every catchment. Consistency with a perceptual model of catchment processes should be assessed before applying one of the models contained in the R packages. Whether it is through a complex representation of shallow groundwater contribution to runoff (leading to a higher number of parameters to estimate), more conceptual calculations of soil moisture or the discretisation of a catchment into different response areas (hence, more preprocessing operations), any user will now have more materials related to what the models really imply and how these specificities are made available as outputs by the packages.

Table

Functionalities provided by the packages. Note: ✓ – the item is offered by the package;

Weighted combinations of criteria are possible with the

Automatic calibration in the packages either corresponds to functions permitting the use of calibration algorithms from other packages with the package-specific R objects, or to the package's own algorithm. Complete examples of automatic calibration with

The plot function of

The

Table

Assessment of package documentation. Note: Description – information on the general purpose of the function and the associated possibilities; details – precise explanations of the function; arguments – description of the function arguments that requires details about the unit, the R object class and how to obtain it; value – description of the function outputs; references – related documentation where users can find more information on the function; examples – R commands to use the function (examples are considered as being comprehensive if they cover most of the functionalities and if there is an example for each function); data set – an example data set is provided and can be used with the package functions; steps between functions – explanations of the required stages to run the main functions. The last two fields are not explicit parts of the package manuals but can help one to understand and use a package.

The

Additional available package documentation and support. Note: @ – email; GL – GitLab; GH – GitHub (URLs in this table were last accessed on 20 September 2020).

Looking at Table

Considering the previous analysis along with package distinctiveness in terms of R implementation raises the question of how to use the packages containing hydrological models. Figure

Unified diagrams illustrating the package main functions that are necessary to proceed with parameter estimation and validation on a basic example in hydrology modelling. R core functions are not explicitly illustrated. Basic data preprocessing steps, such as checking data gaps and consistency, are not displayed on this diagram, but it is strongly recommended to adopt such practices before operating a model. Users can run

The diagrams of Fig.

As pointed out in Sect.

The path to better guidance and avoiding mistakes in the application of the models – which implies more rigorous methods that include, for instance, uncertainty analyses – is a tricky road facing the wide heterogeneity of packages in terms of R structure and models in terms of modelling choices. One could imagine better harmonisation of packages taking advantage of other packages' strengths (e.g. using a similar structure for the objective function or managing time complexities with the same R functions). This goal is not out of reach and would considerably improve the usability and scope of these packages. It would require defining sampling strategies for parameter estimation

Some packages are entirely coded in R, which is an interpreted language, and some integrate models coded with a compiled programming language interfaced with R, e.g. Fortran

Programming languages of the models used in the selected R packages.

Table

Package dependencies. Base packages and recommended packages are not explicitly taken into account in this table.

We hereby present the CPU time required for one model run with the selected packages (Fig.

Computation times of the packages (time of one run estimated from 1000 runs using the same parameter set with the

The results of Fig.

Following these analyses of hydrology modelling R packages, we discuss how users can actually apply one of the models offered by the selected packages to a specific application or research question, we identify what improvements could be brought to the packages, and we discuss the reasons and the implications of the implementation choices that were made on the packages.

While our analysis focuses on models that can be considered as conceptual rainfall–runoff models, we have highlighted different approaches, assumptions and choices underlying these models, both in terms of structure and spatial discretisation, which imply distinct numbers of parameters and sometimes specific inputs.

This attempt to simplify the users' selection process does not aim at labelling good and bad models or packages and, therefore, cannot shed light on which models hydrologists should use today. In our opinion, two major points should be drawn from this study in terms of hydrological modelling. First, this work should be considered as a tool to determine which models, within the R environment, best fit the specific requirements of the end-user and their perceptions about the dominant processes in a catchment

Appropriately using a new model is fundamentally difficult. Understanding the complexity resulting from the different modelling choices, the model inner consistency and whether it is appropriate for a specific research problem also depends on the perceptual model of the hydrology of a catchment

With respect to our list of packages, we have seen that three main groups of packages emerged from our analyses of practicalities.

The second group of packages, offering fewer functionalities with a simpler R structure, includes

Taking different sources of uncertainty into account, especially epistemic uncertainty in data inputs

The adaptability and efficiency of packages can also be assessed in terms of R structure. The choices that developers have made for their R packages, programming languages used for the core functions of the models and dependence on external packages for some functions have resulted in very different R structures. Using a compiled programming language for the core model functions (

Given that the R language can easily be operated for hydrological purposes, the growth of available modelling packages makes the choice of an appropriate package more complicated. To identify the barriers and opportunities for any hydrologist employing one of the packages, we have first proceeded with a careful analysis of a selection of models contained in eight packages. These models were examined in terms of conceptual storages and fluxes, spatial discretisation, requirements and retrievable outputs. We have then evaluated the packages regarding their practicality, i.e. the integrated technical features, the related documentation, the R hydrological workflow, CPU times and the programming languages interfaced with R. The results of our unified analyses confirmed that the selected models rely on different assumptions with regards to the conceptualisation of the water cycle and, therefore, their emphasis on the main physical processes contributing to river flows. A model structure can be selected, depending on our knowledge of the physical properties of a catchment. As the understanding of these properties is limited, hydrological models are subject to epistemic uncertainties

Our analysis aims to help the R users to select the packages that best suit their requirements. The provided framework containing examples on how to use the models included in the selected packages represents a first step towards more comprehensible and operable R packages for hydrological modelling. With the same aim, the unified representations of the usability of models and packages result in strengthened materials associated with these specific packages. While some limitations regarding package practicalities have arisen from our analysis, we hope that our framework will help developers in improving their packages by providing more transferable methods. We also note there are numerous features currently under development that were not examined in this article.

Despite a thorough selection of packages, some of them were discarded from the analysis. Indeed, as pointed out in Sect.

This work could be considered as an early stage version of a meta-package that would manage to run all the packages through the same R architecture, thus improving guidance, appropriate constraint and reliable comparisons when using hydrological models with R.

Example of plots enabled by the

Example of a plot enabled by the

Example of plots enabled by the

Example plots of simulation results from two model configurations enabled by the

Example of plots enabled by the

Graphical user interface of

Graphical user interface of

Graphical user interface of the

Graphical user interface of the

R scripts to use the packages on simple hydrology examples are provided under the terms of the GNU General Public License 2.0 and are available at

PCA, GT and OD designed the study. PCA conducted the analyses and wrote the paper. All authors discussed the design and the results and contributed to the final paper.

The authors declare that they have no conflict of interest.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors thank Météo-France and the hydrometric services of the French government (SCHAPI) for, respectively, providing the climatic and hydrological data used to calculate the computation times. We also thank the French National Geographic Institute (IGN) for providing the digital elevation model data. We would like to thank Charles Perrin, Vazken Andréassian, Maria-Helena Ramos, François Bourgin, Léonard Santos and Mattia Neri, for their useful remarks on, and suggestions for, the paper. Finally, we thank Jan Seibert (the editor), Elena Toth, Anthony Ladson and one anonymous referee for their comments that significantly improved the paper.

This work was supported by the French Ministry of Environment (DGPR/SNRH/SCHAPI), which provided financial support to the PhD grant of the first author.

This paper was edited by Jan Seibert and reviewed by Elena Toth, Anthony Ladson, and one anonymous referee.