Using a multi-hypothesis framework to improve the understanding of flow dynamics during flash floods

. A method of multiple working hypotheses was applied to a range of catchments in the Mediterranean area to analyse different types of possible ﬂow dynamics in soils during ﬂash ﬂood events. The distributed, process-oriented model, MARINE, was used to test several representations of subsurface ﬂows, including ﬂows at depth in fractured bedrock and ﬂows through preferential pathways in macropores. Results showed the contrasting performances of the submitted models, revealing different hydrological behaviours among the catchment set. The benchmark study offered a characterisation of the catchments’ reactivity through the description of the hydrograph formation. The quantiﬁ-cation of the different ﬂow processes (surface and intra-soil ﬂows) was consistent with the scarce in situ observations, but it remains uncertain as a result of an equiﬁnality issue. The spatial description of the simulated ﬂows over the catch-ments, made available by the model, enabled the identiﬁcation of counterbalancing effects between internal ﬂow processes, including the compensation for the water transit time in the hillslopes and in the drainage network. New insights are ﬁnally proposed

1 Introduction 1.1 Flash flood events: an issue for forecasters Flash floods are "sudden floods with high peak discharges, produced by severe thunderstorms that are generally of limited areal extent" (IAHS- UNESCO-WMO, 1974;Garambois, 2012;Braud et al., 2014).They are often linked to localised and major forcings (greater than 100 mm; Gaume et al., 2009) at the heads of steep-sided, mesoscale catchments (with surface areas of 10-250 km 2 ).
The large specific discharges and intensities of precipitation lead to the flash floods being classified as extreme.Nevertheless, those events are not scarce nor unusual, since on average, there were no fewer than five flash floods a year in the Mediterranean Arc between 1958 and 1994 (Jacq, 1994), and they tend to be amplified against a background of climate change (Llasat et al., 2014;Colmet Daage et al., 2016).Flash floods constitute a significant hazard and are therefore a considerable risk for populations (UNISDR, 2009;Llasat et al., 2014).They are particularly dangerous due to their characteristics, namely that (i) the suddenness of events makes it difficult to warn populations in time, and this can lead to panic, thus increasing the risk when a population is unprepared (Ruin et al., 2008), (ii) the traditional connected monitoring systems are not adapted to the temporal and spatial scales of the flash floods (Borga et al., 2008;Braud et al., 2014), and (iii) the magnitude of floods implies significant amounts of kinetic energy, which can transform transitory rivers into torrents, resulting in the transport of debris ranging from fine sediments to tree trunks as well as the scouring of river beds and the erosion of banks (Borga et al., 2014).
A major area of interest for flash floods is, therefore, better risk assessment, which enables them to be forecasted and the relevant populations to be pre-warned.Greater knowledge and understanding is required to better identify the determining factors that result in flash floods.In particular, in order to implement a regional forecasting system, the properties of the catchments and the climatic forcing and linkages between them that lead to flash flood events need to be characterised.
Published by Copernicus Publications on behalf of the European Geosciences Union.
1.2 Flash flood events: understanding flow processes Due to the challenges involved in forecasting flash floods, there has been considerable research done on the subject over the last 10 years.Examples include the HYDRATE (Hydrometeorological data resources and technologies for effective flash flood forecasting, 2006-2010; Gaume and Borga, 2013), which enabled the setting up of a comprehensive European database of flash flood flash events as well as the development of a reference methodology for the observation of post-flood events, the EXTRAFLO (EXTreme RAinfall andFLOod estimation, 2009-2013;Lang et al., 2014) to estimate extreme precipitation and floods for French catchments, the HYMEX project (HYdrological cycle in the Mediterranean EXperiment, 2010-2020; Drobinski et al., 2014) focusing on the meteorological cycle at the Mediterranean scale and particularly on the conditions that allow extreme events to develop, the FLASH project (Flooded Locations andSimulated Hydrographs, 2012-2017;Gourley et al., 2017) assessing the ability and the improvement of a flash flood forecasting framework in USA on the basis of real-time hydrological modelling with high-resolution forcing, or the FLOOD-SCALE project (Multi-scale hydrometeorological observation and modelling for flash floods understanding and simulation, 2012-2016; Braud et al., 2014), based on a multiscale experimental approach to improve the observation of the hydrological processes that lead to flash floods.
In the northwestern Mediterranean context -especially concerned with specific autumnal convective meteorological events -the European cited research particularly demonstrates the importance of cumulative rainfall (Arnaud et al., 1999;Sangati et al., 2009;Camarasa-Belmonte, 2016), the previous soil moisture state (Cassardo et al., 2002;Marchandise and Viel, 2009;Hegedüs et al., 2013;Mateo Lázaro et al., 2014;Raynaud et al., 2015) and the storage capacity of the area affected by the precipitation (Viglione et al., 2010;Zoccatelli et al., 2010;Lobligeois, 2014;Garambois et al., 2015a;Douinot et al., 2016).The combined influence of the spatial distribution of precipitation and event-related storage capacities, reported in the study of a number of particular events (Anquetin et al., 2010;Le Lay and Saulnier, 2007;Laganier et al., 2014;Garambois et al., 2014;Faccini et al., 2016), suggests that there is a hydrological reaction in some areas of the catchments that arises from localised soil saturation.This statement surmises that there is little direct Hortonian flow, but rather that there is a production of runoff through excess soil saturation or lateral fluxes in the soil resulting from the activation of preferential pathways.
The geochemical monitoring of eight intense precipitation events over a 3.9 km 2 catchment area (Braud et al., 2014) underlined the dominance of the intra-soil dynamic.First, an analysis of the water from the first 40 cm of the soil layer revealed a flushing phenomenon, the water present at the start being replaced by so-called new rainwater (Braud et al., 2016a;Bouvier et al., 2017).In addition, even if the peaks of the floods mainly consisted of new water, with a proportion varying between 50 % and 80 %, it appears that over the entire period of the events, old water accounts for between 70 % and 80 % of the total volume of water discharged, which supports the dominance of the water pathways in the soil.
Finally the geological properties themselves appear to be markers of the storage capacities available over the timescales involved in flash floods (that are of the order of a day).From simple flow balances of flash flood events (Douinot, 2016), studies of the diverse hydrological responses of several catchments over the same precipitation episode (Payrastre et al., 2012) or the application of regional hydrological models dedicated to flash flood simulation (Garambois et al., 2015b), the literature tends to demonstrate the low storage capacity of non-karst sedimentary catchments and marl-type catchments, and, conversely, the potential for storing large volumes of water in the altered rocks of granitic or schist formations.

Applying a multi-hypothesis framework for
improving the hydrological understanding of the flash flood events The knowledge gained about the development of the flow processes (for example, the tracing of events carried out during the FLOODSCALE project; Braud et al., 2014) relates to studies on a number of specific sites where flash floods could be observed while they were taking place.However, being able to generalise the knowledge gained is limited by the specific nature of each study (McDonnell et al., 2007) and by the gap between the spatial scale of forecasts (mesoscale) compared with that of the in situ observations (<10 km 2 ) (Sivapalan, 2003).Hydrological modelling work can be considered as a means of extrapolating knowledge to an extended geographical area, possibly covering catchments with differing physiographic properties.Moreover, hydrological models viewed as "tentative hypotheses about catchment dynamics" are interesting tools for testing hypotheses about hydrological functioning using a systematic methodology.A considerable amount of recently published works has involved comparative studies, using numerical models to develop or validate the hypotheses about the type of hydrological functioning that is most likely to reproduce hydrological responses accurately (Buytaert and Beven, 2011;Clark et al., 2011;Fenicia et al., 2014Fenicia et al., , 2016;;Coxon et al., 2014;Ley et al., 2016).Using the same model's structure but differing solely in terms of the hypotheses tested in the form of modules, the comparison is then focused and restricted to the hydrological assumptions tested.Doing this avoids the limitations on interpretation that are often encountered in comparative studies of models (Van Esse et al., 2013), where numerical choices can influence results independent of the underlying assumptions.
The multiple working hypotheses framework is usually applied using a flexible conceptual and lumped model frame- work, such as FUSE (Framework for Understanding Structural Errors, Clark et al., 2008) or SUPERFLEX (Flexible framework for hydrological modeling, Fenicia et al., 2011).However, Clark et al. (2015a, b) have also proposed a unified structure to test multiple working hypotheses within a distributed modeling framework.To our knowledge, the case studies using the aforementioned frameworks are related to continuous hydrological studies in order to assess hydrological hypotheses through the overall hydrological signature of the catchments.In this work, we extend the method of multiple working hypotheses to the assessment of an event-based hydrological model framework.
The objective is to test a number of proposed hydrological mechanisms that occur during flash flood events in a set of contrasting catchments in the French Mediterranean area.While the proportion of flows passing through the soil appears to be significant, questions arise about how they form: -Are they subsurface flows that take place in a restricted area of the root layer as a result of preferential path activation?Or are they lateral flows taking place at greater depth, comparable to those seen in some aquifers?
-Does the geological bedrock or an altered substratum play a role limited to that of mere storage reservoir, or is it actively involved in flood flows formation?
-Which are the flow processes proportions, according to the events and the catchments?
The aim of this article is to attempt to answer these questions using a multi-model approach that tests different types of hydrological dynamics.The study was based on MA-RINE (Modélisation de l'Anticipation du Ruissellement et des Inondations pour des évéNements Extrêmes), a physically based, distributed hydrological model (Roux et al., 2011;Garambois et al., 2015a), which was developed specifically to model flash floods in the catchments of the French Mediterranean Arc.Several new representations for the soil column and underground flows were proposed (Douinot, 2016) and included in the MARINE model in the form of modules that can be used to test different hydrological functions (Sect.3).Those different hydrological dynamics were applied to a set of catchments, presented in Sect.2, with physiographic properties representative of the whole of the French Mediterranean Arc.The performance of each model was then examined and subjected to a comparative study (Sects. 4 and 5).The contributions of the results for improving the hydrological functioning understanding are lastly discussed in Sect.6 before concluding.
2 Catchments and data used in the study

Study catchment set
We studied the behaviour of four catchments and eight nested catchments in the French Mediterranean Arc (Fig. 1).The catchments (in the order they are numbered in Fig. 1) were those of the Ardèche, Gard, Hérault and Salz rivers.These were selected for the following reasons; (i) they are representative of the physiographic variability found in areas where flash floods occur, (ii) numerous studies of flash floods have already been carried out on the Gard and Ardèche (Ruin et al., 2008;Anquetin et al., 2010;Delrieu et al., 2005;Maréchal et al., 2009;Braud et al., 2014) that could guide the interpretation of the modelling results (Fenicia et al., 2014), and (iii) a considerable number of observations of flash flood events are available for these catchments.
The main physiographical and hydrological properties of the catchments are presented in Table 1. Figure 2 shows the contrasting geological properties of the studied area; the catchments are marked by a clear upstream-downstream difference.The Ardèche catchment upstream of Ucel essentially sits on a granite bedrock with some sandstone on its edges, while downstream the geology changes to predominantly schist and limestone formations.Similarly, the upstream part of the Gard catchment consists of schistose bedrock, while downstream the bedrock is impermeable marl-type and granite formations.The Hérault catchment is split into mostly schist and granitic head watersheds (the Valleraugue and la Terrisse sub-catchments) and is a predominantly limestone plateau (Saint-Laurent-le-Minier sub-catchment).Finally, the Salz is characterised by sedimentary bedrock comprised of sandstone and limestone (Fig. 2).
The Ardèche and the Gard catchments have been subject to intensive monitoring and studies (see later references, https: //deims.org/site/czo_eu_fr_024,last access: 10 May 2018), leading to prior knowledge on hydrological understanding.Both the local in situ experiments (Ribolzi et al., 1997;Braud and Vandervaere, 2015;Braud et al., 2016a, b)   Here, ID is the coding name of the catchments used at Fig. 1 and Table 2, and the following are represented: area (km 2 ), mean slope (-), soil properties of mean soil depth (m) and main soil texture (Tx), sandy loam texture = (Ls), loam texture = (L), silty loam texture = (Lsi).In terms of geology, the following are represented: percentage of bedrock geology (%), including subcategories of sandstone (Sa), limestone (Li), granite and gneiss (GG), marl (Ma) and schists (Sc).(i) Bold values represent the dominant geology.Mean annual precipitation is P (mm).In terms of hydrometry, period is the discharge time-series availability, the mean inter-annual discharge is Q (m 3 km −2 s −1 ), the 2 year return period of maximum daily discharge is Q D2 (m 3 km −2 s −1 ), and the 10 year return period of maximum hourly discharge is Q H10 (m 3 km −2 s −1 ).Hydrometric statistics are calculated from HydroFrance databank (http://www.hydro.eaufrance.fr/,last access: 10 May 2018), and the pluviometric ones using rainfall data are from the rain gauge network of the French flood forecasting services.ID River Outlet Soil properties Period no.1a L'Ardèche Vogüé 622 0.17 elling studies focused on this area (Garambois et al., 2013;Vannier et al., 2013) tend to support a hydrological classification according to those contrasting geological properties, in agreement with the usual hydrogeological signature found in the literature (Sayama et al., 2011;Pfister et al., 2017a).Marl, sandstone and limestone without karst are characterised by limited storage capacities, resulting in higher runoff coefficients and high sensitivity to the initial soil moisture (Ribolzi et al., 1997;Braud et al., 2016a).In contrast, in granite and schist transects located on the hillslope of the Ardèche catchment, infiltration tests and analyses of electrical resistivity signals show the high permeability of the geological substratum in depth (measured up to 2.5 m in depth); high storage capacities reach up to 600 mm in 7 out of 10 assessments with artificial forcing and the three remaining tests suggest local unaltered bedrock (Braud et al., 2016a, b).The natural resistivity profile suggests a regular soil bedrock interface when the latter consists of schist, while the granite one presents a more chaotic structure.Finally, the continuous comparative study of two experimental sites over surface areas of the order of 1 km 2 -one located on the schist upstream part of the Gard catchment and the other one on the downstream granite part -suggests that there is rapid subsurface flow processing on the schist area, while flow formation appears to be controlled by the extension of the saturated zone related to the river on the granitic site (Ayral et al., 2005;Maréchal et al., 2009Maréchal et al., , 2013)).

Forcing inputs and hydrometric data
The hydrometric data were derived from the network of operational measurements (HydroFrance databank, http://www.hydro.eaufrance.fr/,last access: 10 May 2018).Eight to twenty years of hourly discharge observations were available, according to the dates when the hydrometric stations were installed (Table 1).Flood events with peak discharges that had exceeded the 2 year return period for daily discharge (Q D2 in Table 1, which corresponds to the alert threshold for flood forecast-ing centres in France) were selected as events to be included in the study.Thus, only one criterion for hydrological response was considered.This led to a selection of precipitation events of varying origins (for instance, rainfall induced by mountains, stagnant convective cells and rainfall occurring in different seasons, mainly in autumn and early spring).Such a selection risked complicating the study because flow processes can vary from one season to another.Nevertheless, it allowed us to test the ability of the model to deal with different (non-linear) flow physics regimes.Note also that moderate or intense rainfall events without respective hydrological responses might be taken out of the analysis.Nevertheless the first alert threshold used here is small enough to have a selection of flood events with contrasting runoff coefficients (see Table 2).
Precipitation measurements were taken from Météo France's ARAMIS (Application Radar à la Météorologie Infra-Synoptique) radar network (Tabary, 2007), which provides precipitation measurements at a resolution of 1 km × 1 km every 5 min.The French flood forecasting service (SCHAPI -Service central d'hydrométéorologie et d'appui à la prévision des inondations) then used the CALA-MAR patented software (Badoche-Jacquet et al., 1992) to produce rainfall depth data by combining these radar measurements with rain gauge data.This processed dataset is used here as inputs for the model.Each rainfall product is firstly assessed through an individual sensitivity analysis of the standard MARINE model (DWF model; see Sect.3.1).When presenting an atypical sensitivity to the soil depth parameter, the rainfall event is discarded in the study, as it suggests questionable measurements.Depending on the availability of the results of rainfall and hydrometric measurements, 7 to 14 intense events were selected for each catchment (Table 2).Each set is finally split into calibration and validation subsets as follows; the extreme events were kept for validation, and a minimum of three calibration events are chosen in order to cover the wide range of initial soil moisture conditions.infiltration rate i (m s −1 ), cumulative infiltration I (mm), saturated hydraulic conductivity K (m s −1 ), soil suction at the wetting front (m), and saturated and initial water contents, θ s and θ i (m 3 m −3 ), respectively.Subsurface flow contains the following parameters: soil thickness (m), lateral saturated hydraulic conductivity K (m s −1 ), local water depth h (m), transmissivity decay with depth m h (m) and bed slope S (m m −1 ).The kinematic wave contains the following parameters: surface water depth h (m), time t (s), space variable x (m), rainfall rate r (m s −1 ), infiltration rate i (m s −1 ), bed slope S (m m −1 ) and Manning roughness coefficient n (m −1/3 s).Module 2 described in this figure corresponds to the standard definition applied in the MARINE model.
As the MARINE model is event-based, it must be initialised to take into account the previous moisture state of the catchment, which is linked to the history of the hydrological cycle.This was done using spatial model outputs from Météo-France's SIM (Safran-Isba-Modcou, Habets et al., 2008) operational chain, including a meteorological analysis system (SAFRAN; Vidal et al., 2010), a soil-vegetationatmosphere model (ISBA; Mahfouf et al., 1995) and a hydrogeological model (MODCOU; Ledoux et al., 1989).Based on the work of Marchandise and Viel (2009), the spatial daily root-zone humidity outputs (resolution of 8 km × 8 km) simulated by the SIM conceptual model were used for the systematic initialisation of MARINE.
3 The multi-hypothesis hydrological modelling framework

The MARINE model
The MARINE model is a distributed mechanistic hydrological model especially developed for flash flood simulations.It models the main physical processes in flash floods: infiltration, overland flow and lateral flows in soil and channel routing.Conversely, it does not incorporate low-rate flow processes such as evapotranspiration or base flow.MARINE is structured into three main modules that are run for each catchment grid cell (see Fig. 3).The first module allows for the separation of surface runoff and infiltration using the Green-Ampt model.The second module represents the subsurface downhill flow.It was initially based on the generalised Darcy's law used in the TOPMODEL (TOPography-based) hydrological model (Beven and Kirby, 1979), but it was developed in greater detail as part of this study (see Sect. 3.2).Lastly, the third module represents overland and channel flows.Rainfall excess is transferred to the catchment outlet using the Saint-Venant equations simplified with kinematic wave assumptions.The model distinguishes grid cells with a drainage network, where channel flow is calculated on a triangular channel section (Maubourguet et al., 2007) from grid cells on hillslopes and where the overland flow is calculated for the entire surface area of the cell.
The MARINE model works with distributed input data such as (i) a digital elevation model (DEM) of the catchment to shape the flow pathway and distinguish hillslope cells from drainage network cells according to a drained area threshold, (ii) soil survey data to initialise the hydraulic and storage properties of the soil, which are used as parameters in the infiltration and lateral flow models, and (iii) vegetation and land-use data to configure the surface roughness parameters used in the overland flow model.
The MARINE model requires parameters to be calibrated in order to be able to reproduce hydrological behaviours accurately.Based on sensitivity analyses of the model (Garambois et al., 2013), five parameters are calibrated: soil depth, represented as C z , the saturation hydraulic conductivity used in lateral flow modelling, C kss , the hydraulic conductivity at saturation that is used in infiltration modelling, C k , and friction coefficients for low and high-water channels, n r and n p , respectively, with n r and n p being uniform throughout the Hydrol.Earth Syst.Sci., 22, 5317-5340, 2018 www.hydrol-earth-syst-sci.net/22/5317/2018/ Table 2. Properties of the flash flood events as an average on the event set (± standard deviation).ID is the coding name of the concerned catchments (See Fig. 1: no. 1 for the Ardèche, no. 2 for the Gard, no. 3 for the Hérault and no. 4 for the Salz); N evt is the number of observed flash flood events; P (mm) is the mean precipitation; I max (mm h −1 ) is the maximal intensity rainfall per event; Q peak is the specific flood peak (m 3 km −2 s −1 ); Hum is the initial soil moil moisture according to SIM output (Habets et al., 2008); CR is the runoff coefficient (%).drainage network.C kss , C k and C z are the multiplier coefficients for spatialised, saturated hydraulic conductivities and soil depths.In this study, modifications of Module 2 (i.e.subsurface downhill flow) were tested for assessing several possible ways to represent the intra-soil hydrological functioning.Consequently, instead of C z and C kss , new parameters of calibration were introduced, as described in the following section.

Modelling lateral flows in the soil: the development of a multi-hypothesis framework
We proposed several modifications to Module 2 -the subsurface downhill flow submodel -covering the three hypotheses of hydrological functioning: -The deep water flow model (DWF) assumed deep infiltration and the formation of an aquifer flow in highly altered rocks.In hydrological terms the pedologygeology boundary was transparent.The soil column could be modelled as a single entity of depth D tot (m), which is at least equal to the soil depth D BDsol (m) (see Fig. 4).Given the lack of knowledge and available observations, a uniform calibration was applied to the depth of altered rocks, represented as D WB (m), which is rapidly accessible to the scale of a rain event.
Groundwater flow was described using the generalised Darcy's law (q dw , Eq. 1).The exponential growth of the hydraulic conductivity at saturation as the water table (h dw ) rises assumed an altered rock structure where hydraulic conductivity at saturation decreases with depth (the TOPMODEL approach).
with h dw (m) as the water depth of the unique water table, m h (m) as the decay factor of the hydraulic conductivity at saturation with soil depth, S[-] as the bed slope, -The subsurface flow model (SSF) assumed that the formation of subsurface lateral flows was due to the activation of preferential paths, like the in situ observations of Katsura et al. (2014) and Katsuyama et al. (2005).The altered soil-rock interface acts as a hydrological barrier.The rapid saturation of shallow soils results in the development of rapid flows due to the steep slopes of the catchments and the existence of rapid water flows circulating through the macropores as the soil becomes saturated.The soil column was thus represented by a two-layer model (see Fig. 5), with the depth of an upper layer equal to the soil depth D BDsol (m) and a lower layer of uniform depth D WB (m).The lateral flows in the upper layer were described by the generalised Darcy's law.However, variations in hydraulic conductivity were expressed as a function of the mean water content of the layer (θ soil ) and not of the height of water (h soil ) that would form a perched water erential paths in the soil by the increase in the degree to which the soil is filled.The decay factor of the hydraulic conductivity as a function of the saturation rate, m θ , was set according to the linearised empirical relations developed by Van Genuchten (1980) between the hydraulic conductivity and soil water content for the different classes of soil textures.Flows in the lower soil layer (q dw ; Eq. 3) in the form of a deep aquifer were limited by setting the hydraulic conductivity of the substratum as being equivalent to that of the soil divided by 50 (this choice being guided by the orders of magnitude generally observed in the literature; Le Bourgeois et al., 2016;Katsura et al., 2014).The altered rocks were thus assumed to mainly play a storage role.Infiltration occurring between the two layers was initially restricted by the Richards equations, which were incorporated using the set hydraulic properties of the substratum (Eq.4).When the upper layer is saturated, this allows the filling through a piston effect.The depth of the soil layer, D BDsol , was set according to the soil data, while the depth of the substratum, D WB , was calibrated in the same way as in the DWF model.
where h soil and h WB (m) represent the soil water depth in the upper and lower layer, respectively, θ soil and θ WB (−) represent the soil water content of the upper and lower layer, respectively, m θ (−) represents the decay factor of the hydraulic conductivity with soil water content θ soil , K ss = C kss • K BDsol and K dw = 0.02 • K ss (m s −1 ) represents the simulated hydraulic conductivity at saturation of the upper and lower layer in the SSF model, respectively.
-The subsurface and deep water flow model (SSF-DWF) assumed that the presence of subsurface flow was due not only to local saturation of the top of the soil column, but also to the development of a flow at depth, as a result of significant volumes of water introduced by infiltration and a very altered substratum whose apparent hydraulic conductivity was already relatively high.This hypothesis of the process led to a modelling approach analogous to the SSF model (Fig. 5), where the hydraulic conductivity at substrate saturation, K dw , was no longer simply imposed, but instead was calibrated using an additional coefficient, C kdw .In the SSF-DWF model, Figure 4. DWF model of flow generation by infiltration at depth and support of a deep aquifer q dw (h dw ) (Eq. 1).
Figure 5. SSF and SSF-DWF models of flow generation by the saturation of the upper part of soil column and activation of preferential paths (q ss ), with support flow at depth (q dw ) and water exchanges from the upper layer to the lower one according to both soil water content, represented by q inf (θ soil , θ WB ).See Eqs. ( 2), ( 3) and ( 4) for the definition of the flows.
The soil water content prior to simulation was similarly initialised for each model in order to ensure that, for a fixed depth of altered rock, the same volume of water was allocated for all models.The SIM humidity indices (Sect.2.2) were used to set an overall water content for all groundwater flow models for a given flood.
4 Methodology for calibrating and evaluating the models

Calibration method
The three hydrological models studied, DWF, SSF and SSF-DWF, were calibrated for each catchment by weighting 5000 randomly drawn samples from the parameter space for each model (the Monte Carlo method).The weighting was done using the DEC (Discharge Envelope Catching) score (Eq.6; discussed by Douinot et al., 2017) in order to integrate the a priori uncertainties of modelling σ mod, i , i = 1. ..n , as represented by Eq. ( 7), and those related to the flow measurements σ ŷi , i = 1. ..n , as represented by Eq. ( 8).The choice of DEC is justified by the desire to adapt the evaluation criterion to the modelling objectives (for example, by focusing calibration on the reproduction of the rise and peaks Hydrol.Earth Syst.Sci., 22, 5317-5340, 2018 www.hydrol-earth-syst-sci.net/22/5317/2018/ of floods in order to be able to forecast flash floods) while always being aware of the uncertainties in the reference flow measurements.
Given the lack of information, these uncertainties σ ŷi , i = 1. ..n were set at 20 % of the measured discharge, which is in line with the literature on discharge measurements from operational stations (Le Coz et al., 2014), and increased linearly with the 10-year hourly discharge, beyond which, as a general rule, the observed flow is no longer measured but is derived by extrapolation from a discharge curve, making it less accurate (Eq.8).The envelope ŷi ± 2σ ŷi , i = 1. ..n consequently defines the 95 % confidence interval of the observed flows.
The modelling uncertainties σ mod, i , i = 1. ..n were set at a minimum value (as a function of the basic catchment module), thus ensuring that the evaluation of the hydrographs would not be unduly affected by the reproduction of relatively low flows, which were strongly dependent on initialisation using previous moisture data that were not the subject of this study.In addition, it was assumed that a modelling uncertainty of 10 % around the confidence interval of observed flows was acceptable (Eq.7).Finally, the overall overarching envelope ŷi ±2σ ŷi ±2σ mod, i , i = 1. ..n defines hereafter the acceptability zone, that is to say the interval in which any simulated flow would be considered as acceptable, according to the modelling and measurement uncertainty definitions.
with DEC i as the DEC modelling error at time i, ŷi and σ ŷi as the observed discharge and the uncertainty of measurement at time i, d i as the discharge distance between the model prediction at time i (y i ) and the confidence interval of observed flow at time i ( ŷi ±2σ ŷi ), σ mod, i as the simulated uncertainty at time i, and Q and Q H10 as the mean inter-annual discharge and the 10 year maximum hourly discharge of the related catchment, respectively.

Metrics and key points in model evaluation and comparison
Results of the models were first assessed and benchmarked using performance scores (Sect.5.1).The evaluation focused on the performance of the models in reproducing the hydrographs in overall terms but also more specifically on their ability to reproduce the characteristic stages of floods: rising flood waters, high discharges and flood recession.These stages were defined as follows: -The period of rising flood waters is between the moment when the observed flow rate exceeds the mean interannual discharge of the catchment and the date of the first flood peak.
-The stage of high discharges includes the points for which the observed flow was greater than 0.25 times the maximum flow during the event.
-The stage of flood recession begins after a period of t c , which is the catchment concentration time according to Bransby's formula (Pilgrim and Cordery, 1992), 2 ) after the peak of the flood, and ends when discharge is rising again (or, where appropriate, at the end of the event, which is the time of peak flooding + 48 h).
The DEC score has provided a standard assessment of the modelling errors, enabling a reasonable weighting of the simulations.However, for a sake of easy understanding, the percentage of acceptable points of the simulated median time series, Qmed_INT [%] (Douinot et al., 2017), was chosen to evaluate the ability of the models to reproduce overall flows, rising flood waters and high discharges.A point is defined as acceptable when the median simulated value stands within the modelling acceptability zone ŷi ±2σ ŷi ±2σ mod, i , i = 1. ..n .
Conversely, Qmed_INT was not relevant for the evaluation of the capacity to reproduce recessions, because the calculation of this score during the recession interval strongly depends on performance at high discharges.Instead, we used the A slope score defined in Eq. ( 9).It calculates the average standard error in simulating the decreasing rate of the discharge during the flood recession interval.Through the consideration of the A slope score here, it was assumed that the recession rate is a relevant feature of the catchment's hydrologic properties (Troch et al., 2013;Kirchner, 2009).
where d ŷi dt and dy i dt are the observed and the simulated recession rates, respectively, at a time step i that belongs to the flood recession interval i = k. ..l .
The evaluation was completed through the description of the modelling errors (Sect.5.2) in order to identify those that were inherent in the choice of model structure, regardless of the calibration methodology adopted (Douinot et al., 2017).Attention was paid to the a priori and a posteriori confidence interval of the model simulations defined by y Those confidence intervals were standardised according to the DEC modelling error definition (Eq.6), defining the a priori and a posteriori confidence intervals of the modelling errors; where α−xth i is the x th percentile of the α modelling errors distribution at time i.
The latter definition allows for an informative translation of the prior and posterior confidence intervals (Douinot et al., 2017) is larger than 1, the errors of modelling are detected or remain.In addition, the benchmark of both a priori and a posteriori confidence intervals allows for highlighting, which was the remaining modelling errors that were induced by the model's assumptions and those that were induced by the calibration.

Overall performances of the models
Assessment of the performances by catchment.Fig. 6 shows the average and standard deviations of the Qmed_INT scores obtained after the calibration of the DWF, SSF and SSF-DWF models for each catchment studied.The DWF model, assuming deep infiltration and the formation of an aquifer flow in altered bedrock, showed better performance in the Ardèche catchment (no.1), while in the Gard (no.2) and the Salz (no.4) catchments, the SSF and SSF-DWF models, assuming the formation of subsurface flows due to the activation of preferential flow paths by local saturation (SSF) with development of flow at depth (SSF-DWF), produced the most accurate results.On the Hérault catchment (no.3), the modelling results obtained with each model in terms of Qmed_INT were less obvious, although the SSF-DWF model seemed to stand out to some extent.The differences in model performance were more pronounced for the validation events.The better-performing models tended to be more consistent, with equivalent Qmed_INT scores on calibration and validation events, for example, the DWF model on the Ardèche (no. 1) or the SSF and SSF-DWF models on the Figure 6.Qmed_INT scores, with mean Qmed_INT scores obtained for the calibration (a) and validation (b) events, by model and catchment.The Qmed_INT scores were calculated for the whole hydrograph.The x axis refers to the ID number of each catchment (Fig. 1).Finally, the mean attribute refers to the average results over all the catchments obtained with each model.

Gard (no. 2).
There was also a deterioration in performance in several models that had already been judged as less effective, for example, the SSF and SSF-DWF models on the Ardèche (no. 1) or the DWF model on the two catchments of the Hérault, no.3c and no.3d.SSF model versus SSF-DWF model.As a reminder, the difference between the SSF and SSF-DWF models is that the latter has an extra calibration parameter, C kdw , which is able to initialise a significant lateral flow in the subsoil horizons of the soil column (see Eq. 3).The lateral hydraulic conductivity in the deep layer is configured using the hydraulic conductivity from BDsol; K dw = C kdw • K BDsol , with C kdw set to 0.02 • C kss in the SSF model and calibrated in the SSF-DWF model.The small differences between the SSF and SSF-DWF models showed that this flexibility does not produce any significant improvement, with the exceptions of the Ardèche catchment at Meyras and the Hérault catchment at Valleraugue.These two areas have a number of common features that could explain the similar modelling results; they are at the heads of high elevation catchments with steep slopes (Table 1) and are subject to considerable annual meteorological forcing.The calibration of C kdw consistently tended to simulate a significant flow at depth for these two catchments,  with exclusively higher values from the prior confidence interval having been selected (Fig. 7).In general, the calibration of the C kdw parameter of the SSF-DWF model correlates with the more or less sustained, annual hydrological activity of the catchments; the confidence interval of the C kdw coefficient is restricted to low values for the catchments with low mean inter-annual discharges (no.2a, no.2b, no.2c, no.3a, no.3b and no.4) and inversely for the catchments with high mean inter-annual discharges (no. 1, no. 3c and no.3d).

Detailed performances: assessment of the models to simulate the different stages of an hydrograph
Figure 8 shows the detailed assessments according to the specific stages of the hydrographs.It highlights whether the overall performances (Fig. 6) reflect uniform results along the hydrographs or if they actually hide the contrasting likelihood of the simulations over the course of different hydrographs' stages.Uniform results are observed on the Gard catchment at Corbès and Anduze (no.2a and no.2b) and on the Salz catchment (no.4); the SSF and SSF-DWF models demonstrated clearly superior performances for all stage-specific assessments of those catchments.For the Gard catchment at Mialet (no.2c), the detailed assessment (Fig. 8) shows that the overall superiority of the SSF and SSF-DWF models is mainly due to a better simulation of the rising limb.Nevertheless, for any score, the SSF and SSF-DWF models similarly both present the best modelling results compared to the DWF model.
On the Ardèche catchments (no.1a, no.1b, no.1c and no.1d), the overall performances reflect the simulation of the high discharges and of the flood recessions.There, the DWF model gives the best results for simulating those hydrographs' stages.Conversely, it deals slightly less well with the simulation of the rising flood waters.As shown in Sect.5.2, all the models tend to underestimate initial flows prior to the event and during the onset of a flood.The DWF model, in particular, exhibits this modelling weakness; for example, see the onset of floods in the hydrographs for the 18 October 2006 and 1 November 2014 events in Ucel (no.1b) as depicted in Fig. 10, which explains the poorer performance.It can be noticed that the SSF-DWF model clearly better simulated the rising flood waters of the Ardèche head watershed (no.1d), explaining the overall good performance as well of this model on this catchment (Fig. 6).
On the Hérault, the detailed evaluation enabled us to distinguish the performance of the different models.On the one hand, for the two larger catchments (no.1a and no.1b), the DWF model performed slightly better for rising flood waters simulations, while the SSF model gave more clearly better simulations of the flood recessions.On the other hand, the SSF-DWF model generated the best simulations of the rising flood waters and of the high flows on the upstream catchments of La Terrisse (no.3c) and Valleraugue (no.3d), while the DWF model simulated a better flood recession.These contrasting results explained why there is not a specific model that stands out on this catchment.In addition, it suggests a marked influence of the physiographic properties on the development of flow processes, because they are correlated with the differences in the geological and topographical properties of the Hérault (no.3; see Fig. 2 and Table 1).The hydrological behaviours simulated for the Valleraugue and La Terrisse sub-catchments, which are predominantly granitic and schistose and where slopes are very steep, can be distinguished from those of Laroque and Saint-Laurentle-Minier, which are mainly sedimentary and in the form of large plateaus.

Summary of the assessment
Figure 9 sums up the highlighted models according to the assessed hydrograph's stage.It shows when one's model has a clearly higher performance according to the following definition; a model is assessed as clearly superior when the lower bound of the confidence interval of its score is higher than the median values of the scores obtained with the other models.It reveals that the catchments set might be divided into four groups: -A first group of catchments is where the SSF and SSF-DWF models uniformly perform either similar or better than the DWF models.This is the case for the Gard (no.2) and the Salz (no.4) catchments.
-A second group of catchments is where the DWF model gives the best results according to all the scores, except for the rising flood waters assessment.This is the case for the downstream Ardèche catchments (no.1a, no.1b and no.1c).
www.   -A third group is where the models' results are not really discernible.For those catchments, the DWF model appears to simulate the rising flood and the high discharge slightly better, while the recession is better represented by the SSF model.This is the case for the downstream Hérault catchments (no.3a and no.3b).
-A last group is where the SSF-DWF model generates the rising flood and the high discharge slightly better, while the recession is better represented by the DWF model.The head watersheds of the Hérault (no.3c and no.3d) and of the Ardèche (no.1d) catchments are in this group.

Modelling errors inherent in the models' structures
For the sake of conciseness, only the simulation over one catchment is presented.Figure 10 shows the simulation results of the three models over the Ardèche catchment at Ucel (no. 1b).It shows the simulated hydrographs and their confidence intervals compared with the observed flows as well as the inherent errors in the simulations.This highlights the modelling errors due to the choice of model structure (DWF, SSF or SSF-DWF models).When the a priori confidence interval (grey colour) at a time i does not cross the acceptability region (green colour), it means that no parameter set gives an acceptable simulation, and modelling errors due to the structure (or assumptions) of the model are consequentially detected.When the posterior confidence interval (salmon colour) is outside the acceptability zone, the modelling error remains.Finally whether the prior (posterior) interval is large or small, the model's structure allows for reaching a larger or less large range of simulated values (the model prediction is more or less uncertain, respectively).
Representing the soil column with either one compartment (the DWF model) or two compartments (SSF or SSF-DWF models) leads to a distinct a priori confidence interval of modelling errors (grey).The DWF model constrains the simulated flows at the beginning of the event, before the onset of precipitation, because the width of the confidence interval of Hydrol.Earth Syst.Sci., 22, 5317-5340, 2018 www.hydrol-earth-syst-sci.net/22/5317/2018/ www.hydrol-earth-syst-sci.net/22/5317/2018/ Hydrol.Earth Syst.Sci., 22, 5317-5340, 2018 the modelling errors is low at that point.More specifically, it tends to underestimate the initialisation discharges, because the variation interval of the errors over this period is predominantly negative.This may explain this model's relative difficulty in reproducing the onset of floods, since the calibration of the parameters did not allow the acceptability zone in this part of the hydrograph to be reached.A resulting interpretation applicable to the catchment sets is that good results in modelling the rising flood waters with the DWF model mean that the observed rising flow is relatively slow and could be reached in spite of the restrictive modelling structure (for example, no.3a and no.3b).
Likewise, it can be noted that the one-compartment structure (i.e. the DWF model) allows for flexibility in the modelling of high discharges and flood recessions, because the confidence interval of the modelling errors is quite large over these periods in the hydrograph.However, it also led to the underestimation of high discharges and flood recessions.In fact, the prior modelling error interval (in grey) has a negative bias with respect to the acceptability zone.The calibration finally allows the simulations to be selected at the intersection of the acceptability zones and the a priori confidence in modelling errors.This generally corresponds to the calibration of a low-depth altered rock, D WB , in order to make the model more sensitive to soil saturation and more responsive via the generation of early runoff.From that resulting low D WB , the simulated water storage capacity is limited, which might explain the inadequacy of the DWF model for a catchment with small runoff coefficients (no.2, Table 2).
Conversely, the two-compartment structure (the SSF and SSF-DWF models) offers flexibility in modelling the beginning of events, flood warnings and high discharges, but the ability to model flood recessions is more constrained.SSF and SSF-DWF models simulate fast flood recessions in comparison to the DWF model, suggesting that good results in modelling the flood recession with the SSF model that might be interpreted as a fast return to normal or low discharge are observed on the related catchments (as example, no. 2, no.4).
In the SSF and SSF-DWF models, the addition of a flux calibration parameter in the subsoil horizons not surprisingly leads to wider variations in the a priori modelling errors.A surprising finding, however, is that the calibration of the lateral conductivity of the deep layer, C kdw , seems to affect only the simulation at the beginning of the hydrographs (see the events of 1 November 2011 and 13 November 2014, Fig. 10) and has a very little effect on flood recessions.The high similarities of the prior modelling intervals of the SSF and SSF-DWF models explain the similar performances of those models.In the same way, when there is improvement in the performance through the SSF-DWF, it concerns the early rising of the flood; as the detailed performances have already shown, the SSF-DWF enables the fast and early start of the flood events.

Analysis of relevance of the internal hydrological
processes simulated

Characterisation of the hydrological processes simulated
The proportional volumes of the water making up the hydrographs, which arise from the three main simulated paths (on the surface, through the top or through the deep layer of the soil), were calculated.Figure 11 shows the simulated runoff contribution, i.e. the water that has not passed through the soil at any point.The contributions of these surface flows on the whole of the hydrograph (Fig. 11, left) and those that support high discharges (Fig. 11, right) are distinguished.Note that the other contributions are not detailed, being correlated to the runoff assessment and therefore leading to a similar analysis.
The runoff contribution simulated by the DWF model even further discredits that model for representing the hydrological behaviour of the Gard (no.2) and Salz (no.4) catchments.Really high proportion of runoff contribution over the entire hydrograph were simulated, ranging from 40 % to 98 %.In contrast, the few experimental measurements made on the Gard (Bouvier et al., 2017;Braud et al., 2016a) provide evidence of the proportions of new water, which might be seen as an upper bound for runoff contribution volume, ranging from 20 % to 40 % of the volumes in the hydrograph.The SSF and SSF-DWF model conversely gave a more reasonable runoff contribution, although it remained high, ranging from 19 % to 62 %.
The assessment of the flow contributions through the most suitable model's simulations for each catchment revealed in Sect.5.1 is consistent with the catchment set's diversity.Considering the DWF model for the Ardèche catchment and the SSF and SSF-DWF models for the Gard catchment, the runoff contributions to the high flows of the hydrographs were slightly lower in the three downstream Ardèche catchments (no.1a, no.1b and no.1c, with runoff contributions included between 17 % and 57 %) compared to the runoff contributions in the Gard catchment (no.2a, no.2b and no.2c) and in the upstream part of the Ardèche (no.1d, with runoff contributions between 20 % and 78 %).It is consistent with both the properties of the catchments and the rainfall forcing, with the first catchment subset (no.1a, no.1b and no.1c) having deeper soil cover, a more permeable soil texture (see Table 1), and being forced by rainfall with lower maximal intensities (see Table 2), which is in contrast to the second one (no.2a, no.2b and no.2c).
On the downstream catchments of the Hérault (no.3a, no.3b), the variation intervals of the surface flows estimated by the three models overlap.It may explain why the three models can achieve good reproductions of the hydrological signal; the calibration step makes it possible from that integrated point of view to obtain an analogous distribution of the flow processes.Table 3. Realistic models and parameter sets for the Hérault catchment at Saint-Laurent-le-Minier (no.3b).C soil : the contribution to the hydrograph of flows passing the soil.C kdw /C * kss : the value of the parameter C kdw for model DWF (Eq. 1) or the value of the parameter C * kss for the model SSF (Eq.2).Notwithstanding the uncertainty related to the choice of the model when any model has been identified most suitable through the performances, the largest uncertainties are related to the parameterisation of the models, a consequence of the equifinality of the solutions when calibrating a hydrological model against the sole criterion of the reproduction of the hydrological signal.While in terms of plausibility, several sets of parameters may be equivalent, even for the same model, these sets of parameters are likely to lead to a different hydrological functioning.

Detailed study of four plausible simulations on
the Hérault watershed at Saint-Laurent-le-Minier Spatialised and integrated changes in moisture levels and flow velocities generated within the catchments have been considered in order to give new details on the different impacts of the models' structure, but also to explain the resulting uncertainty when assessing the flow processes' distribution.Next, the results of four simulations are described and are equally considered to be plausible according to the DEC criterion obtained from the DWF and SSF models (two simulations per model, see Table 3).The Hérault catchment at Saint-Laurent-le-Minier (no.3b) has been considered because of the equivalence of the models in representing that catchment.Figure 12 compares the changes over time in the state of soil saturation and the different simulated flow velocities of the four model + parameter set configurations (Table 3).Figure 13 compares the spatial distributions of these variables at a given moment.In terms of hydrographs, which is quite logical given the similar likelihood scores, the simulations differed very little.The notable difference in the generation of hydrographs is the contribution of the different simulated flow paths.The proportions of water passing through the soil column (via subor surface-soil horizons) were highly variable, with an average of 39 % for the DWF2 model, 53 % for the SSF2 model, 61 % for the DWF1 model and 68 % for the SSF1 model (Table 3).This is both due to (i) the structural choices (DWF and SSF) that involved a different saturation dynamics and the incorporation of different types of flow, and to (ii) the choice of the parameters that involved flow velocities of different orders of magnitude.
The choice of a model's structure (DWF and SSF) implied differences in soil moisture spatial distribution and dynamics, which in turn impacted the timing of the flow processes.In the DWF structure, the soil moisture distribution is sensitive to the soil depth spatial distribution as a result of the decrease in the simulated intra-soil flows as a function of water  model produced a greater contrast in saturation levels between different areas of the catchment (Fig. 13a and d).With the SSF model, the overall catchment saturation level was more related to the topography; saturated cells were observed close to the drainage network, and, lower water content was conversely observed the upper reaches of the catchments.
In fact, for the SSF model, rainfall forcing is mainly involved in the saturation of the upper soil layer (the dashed lines in Fig. 12b), which reacts very rapidly to precipitation.As a result of the contrasting soil moisture dynamic, the flow velocities simulated in the soil showed consecutive differences.At the start of flooding, the SSF structure resulted in an early increase in flow velocities due to a higher and more homogeneous saturation level of the upper soil layer (Fig. 12c).Conversely, in the DWF model that simulated a more heterogeneous spatial saturation of the catchment, the simulated velocities increase was delayed, and the maximum values reached were 2 to 4 times lower.
The dynamics in the drainage network were impacted by the choice of the structure as well.The runoff velocities' average reflected the earlier inlet of the subsurface flow processes through the fast saturation of the upper compartment with the SSF model (Fig. 12e).The DWF model yields a more contrasting variation in the runoff velocities in the drainage network, mirroring variations in soil saturation levels.
The choice of parameters mainly implied different ranges of values for the velocities simulated in the soil, on the surface of the hillslope and in the drainage network.The calibration of the C kss and C kdw parameters controlled the order of magnitude in the subsurface velocities (Table 3 and Fig. 13b, e, h and k).The calibrated C k (infiltration capacity control) and D WB (depth of the subsoil horizon) parameters controlled the infiltration as well, leading to a higher or less high number of cells with excess saturation or the infiltration capacity being reached (Fig. 13c, f, i, l) and consequently to a higher or less high proportion of runoff over the hillslope (Fig. 12d).
Several orders of magnitude were actually allowed while respecting the calibration objective, because the transit times of the different water pathways compensate each other.As foreshadowed by those four configurations, the selection of plausible parameter sets for any model in any catchment shows (i) a positive correlation between the parameters C k and n r and n p , suggesting the necessity of slowing down flows in the drainage network when a larger proportion of runoff from the catchments is simulated (i.e low C k would imply low n r and n p and vice versa) and (ii) a positive correlation between C k , C kss and C kdw parameters, suggesting the necessity of accelerating the intra-soil flows when high infiltration rate is allowed and, consequentially, when larger proportion of subsurface flow is simulated.Thus, a degree of compensation occurs in the simulated transfer times between the various water paths from the hillslopes to the drainage network and from the drainage network towards the outlet.

On the hydrological functioning of the catchments studied
The benchmark of the models' performance on the catchment set leads to reveal four subsets, suggesting four distinct hydrological behaviours.According to the modelling assumptions (Sect.5.1), the resulting errors in simulating the different stages of the hydrographs (Sect.5.2) and the catchment properties (Sect.2.1), the hydrological behaviour of the catchment can be interpreted by each subset as follows: -The SSF and SSF-DWF models showed better overall performance (with no particular pattern) in the first subset, the Gard (no.2) and Salz (no.4) catchments.This suggests, on the one hand, rapid catchment reactivity with fast rising flood waters as well as a fast flood recession, and on the other hand, the formation of the flows in the soil through local saturation tied to the climate forcing.Although the models exhibited similar performances, the contrasting physiographic characteristics of these catchments suggest that there are different explanations for this better fit of the SSF-DWF model.On the Gard, the very high intensities of the observed events (Table 2) and/or the low soil depth (Table 1) may explain the limitations on vertical infiltration due to the properties of the soil and/or geological bedrock.As a result, the rapid formation of a saturated zone at the top of the soil column favours runoff and a subsurface flux by activating preferential paths in the soil.This interpretation is in agreement with the field studies achieved on a schist upstream sub-catchment of the Gard, the schist substratum being the predominant geology of the Gard catchment (see Sect. 2.1, Ayral et al., 2005;Maréchal et al., 2009Maréchal et al., , 2013)).On the one hand, on the Salz (no.4), the soil is deeper and the precipitation intensities lower.On the other hand, the geological bedrock composed of marl, sandstone and limestone is assumed to have low permeability, and the soil is less conductive due to its predominantly silt-loam texture.As a result, despite the lower forcing intensities, the surface soil can reach saturation, which might explain why the SSF model offers the best fit.
-The considerable hydrological responses in terms of volume on the Ardèche second subset appear to be linked to hydrological activity at depth, including that which takes place during intense floods, as suggested by the better fit of the DWF model.Here, in particular, the model gave a better representation of the relatively slow and uniform hydrological recessions from one event to the next, reflecting an aquifer-type flow whose discharge properties are only governed by the properties of the catchment bedrock only.This interpretation is enforced by the field studies achieved at the time in Hydrol.Earth Syst.Sci., 22, 5317-5340, 2018 www.hydrol-earth-syst-sci.net/22/5317/2018/ a granite experimental sub-catchment localised in the downstream part of the Gard (Sect.2.1, Ayral et al., 2005;Maréchal et al., 2009Maréchal et al., , 2013)), the Ardèche catchment being granitic.The somewhat delayed flood timing that the structure of the one-compartment model imposed seems to indicate that there are rapid flows at the beginning of an event, which this model structure is not able to represent.A plausible explanation is the default calibration, which uses a uniform depth of active subsoil horizons, D WB , during a flood.This might mask the appearance of local saturation zones and the subsequent runoff due to shallow soil and discontinuities in the permeable base layer (for example, in the downstream sedimentary layers, where infiltration tests have shown the appearance of runoff; see Sect.2.1).In contrast, the SSF and SSF-DWF models did not display this weakness because the varying nature of soil depths (D BDsol , which determines the depth of the upper compartment) allowed for the rapid development of flows via preferential paths in the soil blocks, thus enabling the simulation of such local dynamics.
-The third subset consists of the downstream part of the Hérault (no.3a and no.3b).The models' performances contrasted with the Hérault catchment heads (no.3c and no.3d), suggesting hydrological behaviours related to the contrasting geological properties.An interpretation of hydrological functioning is nevertheless not possible, given the similar overall results offered by the models and that no distinctions can be drawn according to other criteria.
-The last subset consists of the catchment heads (no.1d, no.3c, and no.3d).We observed superior performances from the DWF and SSF-DWF models, with a particular improvement in the forecasting of rising flood waters when using the SSF-DWF model.This suggests the presence of several types of flow in the soil with strong support from flows at depth, which corroborates the high mean inter-annual discharges associated with these catchments, and additionally the presence of rapidly formed flows, providing a good simulation of the rising flood waters.The fact that the model SSF-DWF, which precisely alleged to represent the simultaneous setting up of shallow and deep subsurface flows, did not completely outperform the two other models is interesting.
From our point of view, it points out the limit of their artificial implementation, using a threshold infiltration from the top layer to the deep one.In reality, the simultaneous setup of the two fluxes more likely refers to the spatial heterogeneity of the soil properties, especially in the head watersheds within a catchment cell (2.5 km 2 ), which might allow either deep infiltration or fast topsoil saturation.

Overcoming the remaining uncertainty
The submitted multi-hypothesis test classically faced the equifinality issue related to the parameter uncertainty and highlighted the uncertainty related to the model's structure.The comparative and detailed description of the simulation revealed the model's structure controls, thus giving almost direct guidelines to overcome the equifinality issue.
One of the objectives of the study, the assessment of the flow contributions to the hydrographs, is not completely reached, mainly because of the parameter uncertainty (Sect.5.3.1).The benchmark of modeling configurations, scanning the different simulated processes (Sect.5.3.2),showed how the calibration lead to that uncertainty.The wide range of values that has been allowed through the parameter setup enabled counterbalancing effects between the internal velocities simulated.As a direct consequence, variable flow contributions could be simulated while finally producing similarly likely hydrographs.This points out direct further objectives for improving and better restraining the calibration of the models.While several ranges of value for the internal flow velocities have been simulated, a reasonable restriction based on the velocity likelihood could be foreseen.This further perspective should also shift experimental studies toward a better assessment of the water transit time along the different pathways at the hillslope scale, either using direct methods such water isotope tracing (Tetzlaff et al., 2018), developing imaginative indirect ones such as the diatom tracing (Pfister et al., 2017b), or taking advantage of suspended particles and water turbidity measurements.
The equifinality of the models in several catchments mostly points out the limit of the assessment of hydrological model through the sole use of the hydrological discharge time series at the outlet.Leading up to a multi-criteria calibration, the detailed comparative description outlined the discrepancies of the simulations and thus provided guidelines for integrating judicious information to differentiate the models' adequacy.The distinguished saturation spatial patterns generated by the DWF and SSF structures suggest the relevancy of the soil moisture distribution assessment along hillslopes and soil heterogeneities, as the first structure implied a soil moisture dynamic related to local soil properties, while the latter implied a soil moisture pattern related to the distance to the drainage network.In addition, the description of the a priori modelling errors (Sect.5.3.2) points the way towards an optimal consideration of the early rising limb and the flood recession, when calibrating the models over the discharge time series.Indeed, the model's structure appeared to mostly control these particular stages, especially the simulated timing of the first stage and the simulated dynamic of the latter one.A consequential need to accurately discharge benchmarks, particularly during these stages, should further direct the river monitoring toward the high temporal resolution of the river level, with the rising and receding flood stages being short periods during flash floods and efforts for reducing the uncerwww.hydrol-earth-syst-sci.net/22/5317/2018/ Hydrol.Earth Syst.Sci., 22, 5317-5340, 2018 tainty of the rating curve at low and moderate flows, rather than getting extreme discharge measurements, including, for example, the hysteresis of the discharge curves (Le Coz et al., 2014).

Conclusions
7.1 Summary of the study's objectives and methodology The objective of the study was to improve our understanding flash flooding in the French Mediterranean Arc.In particular, attention was paid to the dynamics of soil saturation in catchments during these events and their possible relationship with the physiographic diversity encountered.The method used consisted of the consideration of hydrological models as a diagnostic tool to test hypotheses about the functioning of catchments.
Based on the structure of the MARINE model, a hydrological model with a physical and distributed basis, three types of dynamic of soil saturation were postulated and tested.In the first case (the DWF model), we assumed an aquifer dynamic with an infiltration at depth and the generation of a strong base support according to the volume of infiltrated water.In the second case (the SSF model), it was the activation of preferential paths at the soil-altered-rock interface that generated the majority of the flows passing through the soil, with the lower part of the soil column serving only as a storage reservoir.In the third case (the SSF-DWF model), there was flow generation via both the activation of preferential pathways, initially by the saturation of the top of the soil column, and a significant increase in the base flux via the subsequent infiltration of water present at deeper levels.
The same calibration strategy was used for the three models on a set of 12 catchments, which are representative of the diverse characteristics of the Mediterranean Arc.Whether a model offers a good fit was evaluated on the basis of scores representing overall or partial model performance in terms of simulating the hydrographs, the proportions of the processes simulated, and the timing and form of flood recession.

Conclusions on our understanding of the processes involved
The specific use of a multi-hypothesis framework supports a clear comparison of the hydrological behaviours, which has in turn provided the main basis of the insights of this study.
From the application and validation of the three hydrological models, the 12 catchments of the study could be classified into four categories, including (i) the Gard and Salz catchments, for which the SSF model is better suited to reproducing the hydrological signal, highlighting the importance of local and surface soil dynamics in the generation of flows especially at the beginning of a flood, (ii) the Ardèche catchments, for which the DWF model most accurately reproduces the observed flows, which indicates more regular and integrated hydrological functioning at the catchment level, with the flows generated being directly related to the moisture history and rainfall volumes, (iii) the Hérault catchments at Valleraugue and La Terrisse and the Ardèche catchment at Meyras, which have steep-sloped catchment heads where the SSF-DWF model stands out, suggesting both sustained and significant hydrological activity at depth during flash floods and surface activity in the establishment of early flows at the beginning of events, and (iv) the Hérault catchments at Laroque and Saint-Laurent-le-Minier, for which no model shows any significant difference.
The modelling results help to draw consistent assumptions of hydrological behaviours, which corroborate, when available, the knowledge and observations on the overall hydrological functioning of the catchments or the experimental estimations of flow processes.The results suggest that the behaviour of catchments under extreme forcing is a continuation of the hydrological functioning normally encountered.
The assessment of the flow processes in the catchments remains uncertain, owing to the equifinality issue.The analysis of the internal processes enabled the explanation of the compensation effects between the simulated flow pathways and the resulting uncertainty of the calibrated parameter sets on the sole basis of the discharge time series.In addition, other detailed descriptions of the simulations, such as the spatial dynamic of the soil moisture distribution or the modelling errors, highlighted the actual impacts of the model's assumptions on the simulations.The revealed discrepancies between models, namely the range of values of the flow velocities, the spatial pattern of the soil moisture, the early rising limb timing and the recession rate of the hydrographs, finally defined pertinent milestones for improving the assessment of the model's adequacy.

Figure 1 .
Figure 1.Locations of the catchments studied, with a topographic visualisation at a resolution of 25 m (Source -IGN; MNT BDALTI).
as the simulated hydraulic conductivity at saturation and D tot = D BDsol + D WB as the soil column depth.Calibrated parameters are in bold.
1. ..n and y DEC−5th i , y DEC−95th i , i = 1. ..n , respectively, where y prior−5th i and y prior−95th i are the 5th and the 95th www.hydrol-earth-syst-sci.net/22/5317/2018/ Hydrol.Earth Syst.Sci., 22, 5317-5340, 2018 percentile of the 5000 model simulation values at time i, and where y DEC−5th i and y DEC−95th i are the 5th and the 95th percentile of the same but weighted series according to the DEC calibration criterion.
; a value of α−xth i equal to 0 indicates that the y α−xth i bound lies within the discharge confidence interval.If 0< α−xth i ≤ 1, the y α−xth i bound lies within the acceptability zone.If α−xth i Figure 7. (a): Mean inter-annual discharge (m 3 km −2 s −1 ) for the catchments.(b): a posteriori distribution of the calibration of the subsoil horizon hydraulic conductivity in the SSF-DWF model (the C kdw parameter; Eq. 3)

Figure 8 .
Figure 8. Assessment of the models by catchment in the different stages of the hydrographs.(a): Qmed_INT scores calculated over the rising flood waters stage.(b): Qmed_INT scores calculated over the high discharges stage.(c): A slope scores.High Qmed_INT scores and conversely low A slope values indicate good performances of the model.

Figure 9 .
Figure9.Summary of the models' benchmark.A colour is attributed for each score and each catchment when one model gives a clearly superior performance, or two colours are attributed for each score and each catchment when two models give clearly superior performances: the score of a model is defined as clearly superior when the lower bound of its confidence interval is higher than the median values obtained with the other models.The superiority of a model might be half attributed if the criteria is only respected for the calibration processes.Colour attribution: orange for the DWF model, blue for the SSF model, green for the SSF-DWF model and grey when the superiority of one's model is undetermined.

Figure 10 .
Figure 10.Calibration of the three models for the Ardèche catchment at Ucel (no. 1b).The results of the simulation of five flood hydrographs and the inherent modelling errors (Eq.10) for each model (a: DWF, b: SSF and c: SSF-DWF).The median simulation and the posterior confidence interval are shown in red and salmon, respectively.The confidence intervals of the measured flows and the acceptability zone are shown in green and blue, respectively.The a priori confidence intervals for each model (i.e. with no calibration) are shown in grey.Denoted are events of calibration ( * ) and events of validation ( * * ).

Figure 11 .
Figure 11.Proportion of surface runoff in the flows at the outlet.Left: The proportion over the whole hydrograph.Right: the proportion at high discharges (observed flow greater than 0.25 times the maximum flow during the event).

Figure 12 .Figure 13 .
Figure 12.Comparison of the results of four equally plausible simulations for the Hérault at Saint-Laurent-le-Minier (Table 3).(a) Flood hydrographs (solid lines) and outlet flows transiting via the soil (dashed lines).(b) Evolution in the overall moisture content of the soil column.(c) Evolution in simulated mean velocities in the subsoil horizon (DWF model) and in the upper part of the soil column (SSF model).(d) Average runoff velocities on the hillslopes.(e) Average runoff velocities in the drainage network.Denoted are events of calibrations ( * ) and events of validation ( * * ).

Table 1 .
Physiographic properties and hydrological statistics of the 12 catchments.