Articles | Volume 27, issue 3
Research article
09 Feb 2023
Research article |  | 09 Feb 2023

Deriving transmission losses in ephemeral rivers using satellite imagery and machine learning

Antoine Di Ciacca, Scott Wilson, Jasmine Kang, and Thomas Wöhling

Transmission losses are the loss in the flow volume of a river as water moves downstream. These losses provide crucial ecosystem services, particularly in ephemeral and intermittent river systems. Transmission losses can be quantified at many scales using different measurement techniques. One of the most common methods is differential gauging of river flow at two locations. An alternative method for non-perennial rivers is to replace the downstream gauging location by visual assessments of the wetted river length on satellite images. The transmission losses are then calculated as the flow gauged at the upstream location divided by the wetted river length. We used this approach to estimate the transmission losses in the Selwyn River (Canterbury, New Zealand) using 147 satellite images collected between March 2020 and May 2021. The location of the river drying front was verified in the field on six occasions and seven differential gauging campaigns were conducted to ground-truth the losses estimated from the satellite images. The transmission loss point data obtained using the wetted river lengths and differential gauging campaigns were used to train an ensemble of random forest models to predict the continuous hourly time series of transmission losses and their uncertainties. Our results show that the Selwyn River transmission losses ranged between 0.25 and 0.65 m3s-1km-1 during most of the 1-year study period. However, shortly after a flood peak the losses could reach up to 1.5 m3s-1km-1. These results enabled us to improve our understanding of the Selwyn River groundwater–surface water interactions and provide valuable data to support water management. We argue that our framework can easily be adapted to other ephemeral rivers and to longer time series.

1 Introduction

Transmission losses are the loss in the flow volume of a river as water moves downstream (Walters1990). An important consideration of this definition is that transmission loss refers to all of the water lost by a river – evaporation, transpiration by macrophytes and riparian vegetation, as well as groundwater recharge (McMahon and Nathan2021). In dryland regions, where water scarcity is a major issue, rivers are often ephemeral or intermittent (i.e. non-perennial) and are thought to be the primary source of groundwater recharge (Shanafield and Cook2014; Wang et al.2017). In addition, intermittent and ephemeral rivers shelter specific freshwater biodiversity and play an important role in biogeochemical cycles (Datry et al.2014; Fovet et al.2021). Interactions between non-perennial rivers and groundwater can be particularly complex with, for example, the development of perched aquifers during high flows (Shanafield et al.2021; Villeneuve et al.2015; Wheater et al.2010).

The quantification of transmission losses and groundwater–surface water interactions has been approached in many different ways (Cook2015; Kalbus et al.2006). The methods used to estimate transmission losses can be classified into three groups following Shanafield and Cook (2014), depending on whether they rely on measurements of streambed infiltration, groundwater state variable(s) or river discharge. Estimating the streambed infiltration typically gives point estimates and can be done directly with seepage meters (e.g. Lee1977; Lee and Cherry1979; Rosenberry et al.2020) or indirectly using tracers (e.g. González-Pinzón et al.2015; Hatch et al.2006; Le Lay et al.2019). However, as stated by Cook (2015), small-scale estimates cannot be easily extrapolated to larger scales relevant for water management. Another way to approach this problem is to conduct measurements in the groundwater in order to determine the river recharge response signal. This can provide larger-scale estimates of the groundwater recharge by means of hydraulic (e.g. McDonald et al.2013) or chemical measurements (e.g. Hoehn and Von Gunten1989; Massmann et al.2009; Popp et al.2021; Schaper et al.2022). Unfortunately, these estimations are often complicated by the amount of information needed on the aquifer properties, which cannot be easily estimated at the appropriate scale. Finally, transmission losses can be quantified by differential gauging of river flow at two locations. Although river flow is routinely measured in many hydrological studies, these measurements are rather labour-intensive and it is difficult to record high-flow events, which occur over very short periods. An easier way to generate river discharge time series is to monitor the river level and generate a stage–discharge rating curve to determine discharge. However, the use of a rating curve introduces uncertainties in the river discharge values, which can be considerable and are often underestimated (Di Baldassarre and Montanari2009; McMahon and Peel2019; McMillan et al.2012). These uncertainties become even bigger when two river gauging stations are used to calculate the transmission losses, as the uncertainties are compounded. For ephemeral rivers, an alternative approach using satellite observations was introduced by Walter et al. (2012). In this approach, the length of the wetted reach downstream of a flow gauging station is visually identified on satellite images. The transmission losses can then be calculated by dividing the river flow at the gauging station by the wetted river length. Walter et al. (2012) used this approach to calculate the transmission losses in the Frio River (Texas, United States) using five images collected between 1994 and 2008.

In combination with measurements, transmission losses and groundwater–surface water interactions can also be quantified using models (Fleckenstein et al.2010; Lewandowski et al.2020; McMahon and Nathan2021). A wide variety of models have been used for this purpose. Early attempts include a linear relationship between the flow rate and the river–aquifer head difference, based on a constant streambed resistance only (Prickett and Lonnquist1971). This relationship is still widely used nowadays as it is implemented in the popular MODFLOW family of codes (Harbaugh2005; Harbaugh et al.2000; Langevin et al.2017; McDonald and Harbaugh1988). However, numerous studies have suggested that this is an oversimplification of the system in many cases, and some proposed alternative expressions (Anderson2005; Di Ciacca et al.2019; Morel-Seytoux et al.2018; Rupp et al.2008; Rushton2007; Rushton and Tomlinson1979). Nevertheless, these alternative expressions rely themselves on numerous assumptions that make them often unsuitable for representing the complex interactions between groundwater and non-perennial rivers. Recently, fully coupled models have been developed with the aim of representing the interactions between groundwater and surface water in all their complexity (e.g. Fatichi et al.2016; Kuffour et al.2020; Maxwell et al.2009; Therrien et al.2010). However, this complexity and the resultant data requirements make them difficult tools to use. Moreover, they need to be calibrated and evaluated on independent data in order to demonstrate their benefits over simpler solutions.

An alternative approach that has gained popularity in the hydrological modelling community over the last decades is machine learning (Shen et al.2021; Solomatine and Ostfeld2008; Tran et al.2021). These algorithms can be very efficient at reproducing the response variable (e.g. transmission losses) with minimum user assumptions, provided that enough training data are available. A machine learning algorithm particularly capable of representing non-linear and complex relationships between variables is random forest. This approach builds an ensemble (a forest) of small decision trees for the response variable by subsampling the predictor data using random combinations of predictor variables. The results of the “forest” are aggregated to determine the ensemble majority (classification) or average (regression) result for the response variable (Breiman2001; James et al.2013). Random forests have been successfully used in hydrogeology to predict the origin of samples, nitrate contamination and redox conditions in groundwater (Baudron et al.2013; Knoll et al.2019; Koch et al.2019; Rodriguez-Galiano et al.2014; Wilson et al.2020). Despite being less common than other machine learning approaches (e.g. artificial neural networks, support vector machines), random forests have also been used in hydrology, including for estimating various hydrological indices at ungauged sites and streamflow forecasting (Booker and Woods2014; Desai and Ouarda2021; Papacharalampous and Tyralis2018; Tyralis et al.2019).

In the coastal plains of New Zealand, most of the groundwater recharge is thought to be sourced from gravel-bed river water infiltration. For example, the annual land recharge is only around 3 % of the river recharge in the Heretaunga Plains (Dravid and Brown1997) and contributes to less than 4 % of the water balance in the Wairau Aquifer (Wöhling et al.2018). In the Central Plains of the Canterbury region, the Waimakariri River is providing more than 80 % of the spring-fed Avon River baseflow and is the major source of groundwater for the Christchurch city area (White2009; White et al.2012). In these regions, groundwater resources are under increasing pressure to meet the demand for municipal, agricultural, and industrial uses (Brown et al.1999; Rosen and White2001; Smith and Montgomery2004; Wöhling et al.2020). The most important rivers for groundwater recharge in New Zealand often have a high braiding intensity, with several channels resulting in wide braidplains (>1km). Interactions between braided rivers and groundwater have received little attention so far, and the quantity of water lost by these rivers and the main recharge mechanisms involved are still largely unknown. This makes any simulation of plausible future scenarios very delicate. Recently, Coluccio and Morgan (2019) published a review of methods for measuring groundwater–surface water exchange in braided rivers, highlighting the difficulties inherent to this kind of river. In the Central Plains of Canterbury, the Selwyn River has been previously used as a benchmark system for undammed alluvial rivers that are under intense pressure for water abstraction (Arscott et al.2010; Datry et al.2007; Larned et al.2011, 2010, 2008, 2007; Rupp et al.2008). Its relatively small width and low braiding intensity (one to two channels most of the time) allow for easier instrumentation and investigation than larger braided rivers. Furthermore, the Selwyn River includes an ephemeral losing reach, for which we could derive an extensive dataset of transmission losses using satellite imagery.

In this article, we present a framework to estimate transmission losses from satellite imagery and predict their time series using random forest regressors. To estimate the transmission losses using satellite imagery, we used a similar approach to Walter et al. (2012) but on a more comprehensive library, with different image sources and with field data to verify our estimations. We then used the transmission loss point data obtained to train an ensemble of random forest models. This ensemble enables us to predict the continuous hourly time series of transmission losses and their uncertainties. This constitutes another novelty of our approach.

The paper is organized as follows. Section 2 presents our study site on the Selwyn River (New Zealand). Next, Sect. 3 details the methods adopted to gauge the river flow, estimate the river transmission losses and predict the hourly time series. In Sect. 4, first the results of one flood event are described, second our complete dataset is analysed and third the predicted hourly time series are presented. Finally, Sect. 5 discusses the physical interpretation, the advantages and limitations of our approach and possible future developments and applications before Sect. 6 concludes with a summary of the most important findings.

2 Study site

The Selwyn/Waikirikiri River flows for 93 km from the foothills of the New Zealand Southern Alps across the alluvial Central Plains of the Canterbury region to Lake Ellesmere and the Pacific Ocean (Figs. 1 and 2a). The river course mainly follows a depression between the alluvial fans of the much larger Waimakariri and Rakaia rivers. In the foothills, the Selwyn River is constrained by hillslopes and has a meandering single-thread channel. This constrained reach is perennial and gaining water from the surrounding hills. When the Selwyn River reaches the alluvial plains, it first arrives in the inland plains, which are formed by the apex of the alluvial fan and are dominated by glacial and periglacial outwash. The Selwyn channel slope decreases abruptly, and it becomes braided or semi-braided. The 3 km-long perennial reach loses water to the underlying aquifers due to the thickening of the gravel assemblage as the river leaves the confines of the foothills. As the transmission loss increases, the river becomes ephemeral for around 30 km of its length. Further downstream, the Selwyn River reaches the coastal plains, which are dominated by post-glacial alluvium and marine sediments, and gains water from groundwater seepage. The Selwyn River first becomes intermittent and then perennial again as the coast is approached (Larned et al.2008; Rupp et al.2008; Taylor et al.1989). The lag time analysis performed by Larned et al. (2008) suggests that it takes several weeks for the water to infiltrate from the upstream gaining river section to a deeper aquifer (∼20 m deep). Part of this water might be captured by the downstream gaining section of the river after travelling underground in a complex network of aquifers. Two long-term gauging stations are recording flow along the Selwyn River, one in the upstream section at Whitecliffs and one in the downstream section at Coes Ford (Fig. 2a).

In this study, we focus on the first part of the ephemeral losing reach, extending for 15 km upstream of the confluence with the Waianiwaniwa and Hororata rivers (Figs. 1 and 2a). The studied reach flows through the inland plains, which are dominated by glaciofluvial gravels covering Cretaceous and Tertiary sedimentary basement rock to depths of 120–160 m (Taylor et al.1989; Wilson1973). In this region, aquifers are complexes of interbedded gravels, partially separated by leaky aquicludes. Groundwater flows sub-parallel to the direction of the Selwyn River, following the topographic gradient and the anisotropic permeability in the aquifer gravels (Burden1984). Aquifers in this region are recharged by water leaking through river channels and infiltrating through the land surface. Three aquifers have been identified between the Selwyn River and the basement rock (Vincent2005). Recently, Banks et al. (2022) described an additional thin (3–4 m) and highly permeable aquifer associated with the Selwyn River, referred to as the “braidplain aquifer”. Hyporheic exchanges and parafluvial flows occur within this shallow aquifer, which leads to very dynamic interactions between the river and the braidplain aquifer and an alternation of losing and gaining sections. However, the studied reach and its braidplain aquifer are always losing water to the deeper aquifer overall, even during high floods. Surface runoffs are limited by the flat topography, high soil permeability (gravels) and absence of a tributary along this section of the river. The deeper aquifer water table is much lower (∼15m) than the river and its braidplain aquifer. Based on water level and temperature data, it can be considered that an unsaturated zone is separating the deeper aquifer from the river and the shallow braidplain aquifer (Banks et al.2022). Our perceptual model is presented in Fig. 2 by means of two cross sections, one along (a) and one across (b) the Selwyn River, representing the river, the braidplain aquifer and the first deeper aquifer. The Selwyn climate, geology, hydrology and geomorphology have been extensively described by Larned et al. (2008).

Figure 1Map of the Selwyn River with the river gauging monitoring station, the eight manual gauging cross sections and the 153 river drying front locations. The different reaches were delimited according to Larned et al. (2008).

Figure 2Schematic cross sections along (a) and across (b) the Selwyn River, its braidplain aquifer and the first deeper aquifer.


3 Methods

3.1 River discharge time series

3.1.1 River discharge measurement

The river discharge time series was derived from the river stage, monitored continuously at a stable cross section, and a stage–discharge rating curve that relates the discharge to the recorded stage. The river stage was monitored at the upstream boundary of the ephemeral losing reach (referred to as “Scotts Road”, Fig. 1) for the period from March 2020 to May 2021 with a Seametrics PT12 pressure transducer (5 m range). The typical accuracy of this sensor is 2.5 mm and the associated uncertainties were propagated to the rated flows. The river stage is reported as a height of water above a local datum around 209 ma.s.l. The stage–discharge rating curve of the Selwyn River at Scotts Road (Fig. 3) was developed using 14 manual flow measurements collected from April 2020 to March 2021 using either a SonTech Flowtracker or an Acoustic Doppler Current Meter (ADCP, RDI StreamPro). These manual discharge measurements ranged from 0.22 to 10.12 m3 s−1 and were conducted when the river stage was between 0.86 and 1.3 m. At the cross section where the stage was recorded, there is one notable widening that caused a change in the correlation between stage and discharge above 1.12 m, and therefore we introduced one break of slope into our rating curve.

The uncertainties of the manual gauging data varied from 2.4 % to 6.5 %. The fitting errors between our manual flow measurements and the rating curve ranged from 0 % to 15 %, with an average of 5 and a standard deviation of 7 %. Considering these two sources of uncertainties, we assumed 20 % of uncertainty in the rated river flows.

Figure 3Stage–discharge rating curve (discharge was log10 transformed). The horizontal lines represent the range of river stages monitored during the study period.


3.1.2 Hydrograph processing

The hydrograph obtained from the stage record and the rating curve was processed in order to extract the peaks higher than 0.3 m3 s−1. First, we have identified each peak by automatically finding the time at which the first-order derivative became negative. They were then filtered using an iterative procedure to only select the peaks higher than 0.3 m3 s−1 and no more than one peak per 48 h. The peak height was taken as the difference between the peak flow value and the minimum before the peak. The hydrograph and the selected peaks are presented in Fig. 4. These peaks were used to calculate the time since the last peak and the peak height associated with each transmission loss estimate. The time since the last peak and the peak height were used to understand and predict the transmission loss dynamics.

Figure 4Hydrograph and flow peaks selected to calculate the time since the last peak and the peak height associated with each transmission loss estimate.


3.2 Estimation of the river transmission losses

We have estimated the Selwyn River transmission losses following two different approaches. The first is a similar approach to that adopted by Walter et al. (2012), who identified the length of the wetted reach downstream of a flow gauging station on five satellite images and calculated the transmission losses by dividing the river flow at the gauging station by the wetted river length. However, we used a much more comprehensive library of satellite images, and this constitutes the originality of our study. The second is a more traditional differential gauging approach and is used as a comparison on several days.

3.2.1 Transmission losses derived from the river drying front locations

The average river transmission losses along the reach downstream of our gauging station (qloss, L2 T−1) were calculated by dividing the river discharge (Q, L3 T−1) by the wetted river length (L, L).

(1) q loss = Q L

L was estimated by measuring the wetted river length from the gauging station to the river drying front location.

The Selwyn River drying front was located on 147 satellite images taken between April 2020 and May 2021. We used satellite images available in the Planet Monitoring collection, which are mainly taken by the Dove satellite constellation and provide 3.7 m-resolution images of the entire Earth daily in four multi-spectral bands: RGB (red, green, blue) and near infrared (Planet Team2017). Additionally, the drying front location was verified in the field on 6 different days in March 2021 using a GPS (Global Positioning System) device (Trimble R10 with a centimetre-level accuracy). The locations of the 153 drying fronts along the riverbed are presented in Fig. 1.

We considered two sources of uncertainty in the wetted river length estimation. The first one is related to the difficulty in identifying accurately the drying front location on the satellite images. A comparison between the GPS and satellite drying front positions showed us that this uncertainty could be up to 100 m. The second source of uncertainty is the determination of the distance between the drying front and the gauging station (i.e. wetted river length). The wetted river length can differ depending on whether the river active channel (where the water is flowing at low flow), the gravel riverbed or the braidplain is followed. The different lengths determined on the 27 January 2021 image are shown in Appendix A as an example. In this study, we adopted the intermediate option of following the riverbed but assumed 10 % uncertainties in the wetted river length estimations to account for this vague definition. The transmission loss estimates derived from satellite images include these two sources of uncertainty, while the estimates made using the GPS points include only the second one.

3.2.2 Transmission losses derived from differential flow gauging

We conducted seven differential flow gauging surveys close to the upstream boundary of the ephemeral losing reach. During each survey, the river flow was measured at eight cross sections. Some cross sections included multiple braids; this resulted in 12 gauging locations along a river reach of 700 m covering three riffle-pool sequences (Fig. 1). The uncertainties of these manual flow measurements depend on instrument and site constraints. For our measurements, the relative uncertainties were estimated between 2.7 % and 6.3 %; the higher relative uncertainties are typically associated with shallow and low flow in the smaller braids.

The reach-scale average transmission losses were calculated by fitting linear models to the relationships between the river discharge and the distance from the first upstream gauging location. The transmission loss values are the slope of the linear models. To transfer the measurement uncertainties to the transmission loss estimates, we have fitted a linear model to each of 10 000 realizations sampled in the uniform distributions representing the measurement uncertainty ranges. The flow gauging measurements, their uncertainties and the linear model ensembles used to calculate the transmission losses are presented in Fig. 5.

The small-scale (between individual gauging) variability is due to complex interactions between the river and the braidplain aquifer. The linear models were used to estimate the reach-average transmission losses over three riffle-pool sequences and remove the localized loss/gain variability. Thus, the loss values derived from the linear models can be directly compared to the losses estimated using the satellite imagery approach. The description and explanation of this small-scale variability are beyond the scope of this study and have already been partly addressed by Banks et al. (2022). More comprehensive investigations will be the focus of future works.

Figure 5Differential flow gauging measurements and linear model ensembles used to calculate the transmission losses.


3.3 Time series prediction using random forest regression models

Random forest regression models were trained on a dataset including the estimates obtained from the satellite images, the field GPS points and the differential gauging surveys. These models enable us to predict the hourly reach-average transmission losses for the wetted reach downstream of the flow gauging station on the days and times without measurements. This provides us with a continuous hourly transmission loss time series covering the entire study period. We used the “tidymodels” framework implemented in the R language (Kuhn and Wickham2020) and the “ranger” implementation of random forest (Wright and Ziegler2017) with 1000 trees per forest. The random forests were trained with three predictor variables, the river stage, the time since the last peak (log10 transformed) and the height of this peak. In the course of the model development, more predictors (e.g. river flow, water temperature, groundwater level, date) have been tested, but they appeared to not significantly improve the predictions in terms of root mean square error (RMSE). We have selected the model with the lowest dimension among the better-performing ones. The randomly selected predictor number was set to two and the minimal node size to one. We used 75 % of the data to train the models and kept the other 25 % for testing them. A stratified sampling was applied to ensure that the distribution of the time since the last peak was similar in the training and testing datasets. For more clarity, we refer to the transmission losses predicted by the random forest models as “predicted” as opposed to the “estimated” values from the field data and satellite images.

To propagate the uncertainties of our estimated transmission losses through the modelling, we trained a random forest on each of 10 000 realizations sampled in the uniform distributions representing the estimated uncertainty ranges. For each realization, different training and testing datasets were selected. Thus, we obtained an ensemble of random forests that we used to represent the uncertainties in the predicted values. The use of random forests is advantageous in this case because they are computationally fast, particularly when implemented in ranger, which is also memory efficient (Wright and Ziegler2017). This efficiency enables an ensemble to be generated for the purpose of describing uncertainties, an approach that would be difficult with other machine learning methods that are more computationally demanding.

Given the stochastic nature of our estimation and modelling, the evaluation of the random forest fits against the estimated losses gives us multiple residual values for each estimation. We report hereafter the average RMSE and the average normalized RMSE (NRMSE, normalized by the mean) of the 10 000 realizations. These evaluation metrics assess how well the random forest realizations could fit the training and testing data points, sampled in the uniform distributions representing the estimated uncertainty ranges. Furthermore, to evaluate the ability of our ensemble to reproduce the estimated transmission losses, we report the RMSE and NRMSE of the average predicted versus average estimated values.

Lastly, we computed a transmission loss duration curve by calculating the exceedance probability of the predicted hourly values in the same approach for generating flow duration curves. This duration curve is homologous to a cumulative frequency curve. This analysis was done considering 1 year of data from 1 May 2020 to 1 May 2021.

4 Results

In this section, we first explain how the reach-average transmission losses downstream of our gauging station vary in time for one particular event in September 2020, then show the complete dataset of estimated values and lastly present our predicted time series.

4.1 September 2020 flood event

The flood event occurring on 18 September 2020 was selected for explaining the transmission loss behaviour because the satellite imagery coverage was particularly good. This allowed us to monitor the transmission loss time dynamic during the first days after peak flow (Fig. 6). Furthermore, a differential gauging field campaign was conducted on 24 September 2020, a day for which we also have a satellite image. This enables a comparison between the two approaches 6 d after peak flow and thus a verification of our method.

During this event, the peak flow at the permanent gauging station was reached around 09:00 on 18 September. However, the wetted river length continued to increase for around 2 d before it stabilized and started decreasing around 3 d after peak flow. Hereafter, we refer to the periods during which the wetted river length is either increasing or stable as “wetting phases” and the periods during which the wetted river length is decreasing as “drying phases”. For this event, transmission losses estimated using the satellite images and the rated flow at the gauging station were maximum at peak flow around 1.2 m3s-1km-1. Then, they decreased linearly with the logarithm of the time since the last peak during the wetting phase (Fig. 7). Finally, they stabilized around 0.35 m3s-1km-1, 3–4 d after the peak, during the drying phase. The transmission losses were non-linearly positively correlated with the river stage, with a relationship resembling a polynomial function (Fig. 8). Furthermore, the transmission losses estimated for 24 September 2020 from differential gauging and from the drying front location identified on a satellite image correspond well given their respective uncertainties.

Figure 6Time series of the river discharge (black dotted line), wetted river length (black dashed error bars and triangles) and transmission losses (solid error bars and circles) estimated using differential gauging (“Gauging”, yellow) and river drying front locations identified on satellite imagery (“Satellite”, purple) during the September 2020 selected event.


Figure 7Selwyn River transmission losses estimated using differential gauging (“Gauging”, yellow) and river drying front locations identified on satellite imagery (“Satellite”, purple) as a function of the time since the last peak (log10 scale) during the September 2020 selected event.


Figure 8Selwyn River transmission losses estimated using differential gauging (“Gauging”, yellow) and river drying front locations identified on satellite imagery (“Satellite”, purple) as a function of the river stage during the September 2020 selected event.


4.2 Complete dataset of transmission losses

The transmission loss time series, estimated using differential gauging, field GPS points and satellite images, follow the pattern described in Sect. 4.1 but for many more events of different magnitude (Fig. 9). The estimated transmission losses range from 0.14 to 1.55 m3s-1km-1. The average value of the estimated transmission losses is 0.44 m3s-1km-1 and the median is 0.41 m3s-1km-1. The upper and lower quartiles are 0.47 and 0.37 m3s-1km-1, respectively. A duration (cumulative frequency) curve calculated from this dataset is shown in Sect. 4.3. Most of the estimated losses (58 %) are below 0.60 m3s-1km-1 and correspond mainly to baseflow periods and river drying phases. The lowest values are found during dry periods, from March to May 2021, when the river stage and discharge were low. The highest losses occur shortly after high-flow events during wetting phases. Although it can be noted that the differential gauging estimates are lower in most instances, the transmission losses calculated with the different approaches correspond well given their respective uncertainties.

When the river stage and discharge became particularly low after April 2021, the river length downstream of our gauging station decreased to a few hundred meters. As a consequence, the uncertainties in our transmission loss estimates increased drastically. In the rest of this article, we exclude the estimates for which the uncertainty is superior to 45 % of their estimated value.

The relationship between the estimated transmission losses and the river stage is presented in Fig. 10 using the time since the last peak (log10 transformed) as the colour scale and the height of these peaks as the point size scale. Note that, on 3 April 2021, the satellite image was taken just before (3 h) the peak flow was reached, and we therefore used the time to the peak instead of the time since the peak. At low flow (up to 1 m stage and 1 m3 s−1 discharge), the relationship between the river stage and the transmission losses is relatively linear, and the estimated transmission losses vary from 0.14 to 0.80 m3s-1km-1. At higher flow (>1m stage and 1 m3 s−1 discharge), transmission losses stop increasing linearly and reach a plateau around 0.45 m3s-1km-1. As explained in Sect. 4.1, transmission losses decrease linearly with the logarithm of the time since the last peak during wetting phases. The peak height appears to control the maximum values estimated during peak flows. Small peaks have only a minor impact, even on losses estimated shortly after peak flows. However, transmission losses estimated shortly after higher peak flows are very dependent on the time since the last peak and could reach more than 1 m3s-1km-1 in several instances (Fig. 11). The relation between the transmission loss behaviour and hydrological processes is further discussed in Sect. 5.1.

Figure 9Time series of the river discharge (black dotted line) and transmission losses (error bars) estimated using the differential gauging (“Gauging”, yellow) and the river drying front methods with field GPS measurements (“GPS”, blue green) and satellite imagery (“Satellite”, purple).


Figure 10Estimated transmission losses using differential gauging, field GPS points and satellite images as a function of the river stage. The colour scale represents the time since the last peak (log10 transformed) and the point size scale represents the peak height. Triangles indicate the September 2020 event presented in Fig. 8 and circles the other data points.


Figure 11Estimated transmission losses using differential gauging, field GPS points and satellite images as a function of the time since the last peak. The colour scale represents the river discharge (log10 transformed) and the point size scale represents the peak height. Triangles indicate the September 2020 event presented in Fig. 7 and circles the other data points.


4.3 Predicted transmission loss time series

The time series predicted using the random forest models is presented in Fig. 12 and the estimated and predicted duration curves, derived for the period between 1 May 2020 and 1 May 2021, in Fig. 13. The random forest models managed to reproduce most of the features observed in the estimated transmission loss dataset and the associated uncertainties. The predicted transmission losses range between 0.16 and 1.41 m3s-1km-1 with a time average value of 0.42 m3s-1km-1. This is slightly narrower than the estimated range (0.14 to 1.55 m3s-1km-1) but with a similar time average value. Evaluating the performance of our model ensemble on the different estimated points in time, it appears that our ensemble average values correspond well to our estimated average values with an RMSE of 0.04 m3s-1km-1 and an NRMSE of 12 %. Analysing the performance of our random forest realizations separately, the average RMSE calculated on our ensemble of random forest model fits is 0.07 m3s-1km-1 on the whole datasets and 0.12 m3s-1km-1 on the evaluation datasets. This corresponds to an average NRMSE of 17 % and 28 %, respectively. The predicted duration curve indicates that for 56 % of the studied year, the Selwyn River transmission losses downstream of our flow gauging station were between 0.25 and 0.65 m3s-1km-1.

Figure 12Transmission loss time series predicted (cyan) using the random forest models trained on the transmission loss data points estimated (orange) using field data and satellite images.


Figure 13Estimated (empirical distribution, orange) and predicted (simulated distribution, cyan) transmission loss duration curves derived for the period between 1 May 2020 and 1 May 2021.


5 Discussion

5.1 Distributed groundwater recharge versus local storage replenishment

We have shown in Sect. 4.1 and 4.2 that the transmission losses in the Selwyn River relate differently to the river stage and flow depending on whether the river is in a drying or wetting phase (first few days after peak flow). The different processes being lumped in the transmission losses can explain these contrasting behaviours. Transmission losses consist generally of evapotranspiration and groundwater recharge. Given the sparse vegetation and the relatively high transmission losses in our study site, most of the water is expected to be lost to the groundwater, although we did not conduct a formal estimation of the respective contributions. In the remainder of this section, we assume that the estimated transmission losses represent the groundwater recharge and neglect other natural or artificial gains and losses. Furthermore, we hypothesize that the river is losing water to the groundwater in two different modes, depending on whether the river is in a wetting or drying phase.

  • During drying phases: the river and its braidplain aquifer lose water to the underlying deeper aquifer all along its wetted length, depending on local hydraulic, geomorphological and geological properties.

  • During wetting phases: the river and its braidplain aquifer still lose water to the underlying deeper aquifer as during drying phases, but additionally the advancing wetting front is refilling the braidplain aquifer storage. This explains the highest losses estimated shortly after peak flow during wetting phases.

The transmission losses estimated using the method presented in this study are an average along the wetted river length. During drying phases, the wetted river length is linearly correlated with the river discharge (Fig. 14). This suggests that the recharge to the deeper aquifer is rather constant along the studied reach. Furthermore, this justifies the comparison between the transmission losses derived from the differential gauging and from the river drying front locations, although they represent losses at different scales. However, during wetting phases, a considerable amount of water is lost at the wetting front to the braidplain aquifer, and therefore the losses are not equally distributed along the river reach. As a consequence, the highest losses are not representative of the spatially distributed recharge to the deeper aquifer, and their values in terms of m3s-1km-1 should be interpreted with caution.

Applying our framework to the Selwyn River improved our understanding of the interactions between surface water and groundwater in this particular system. However, many unknowns remain, including the quantity of water lost at the wetting front to the braidplain aquifer during wetting phases. This quantity should depend on the volume of aquifer to wet and its porosity. The deeper water table under the Selwyn River at the study reach is rather deep (>15m deep) and the water recharging this deeper aquifer is thought to flow through a variably saturated zone (Larned et al.2011, 2008; Banks et al.2022; Vincent2005). Therefore, a significant volume of water could be lost at the wetting front to refill the braidplain aquifer when the river is advancing. An ongoing research project aims at clarifying how the Selwyn River is interacting with its braidplain aquifer, the underlying unsaturated zone and the deeper aquifers. The two modes of groundwater recharge identified in the Selwyn River could also occur in other ephemeral river systems. Applying the framework presented in this article to other systems could help to understand them better.

Figure 14Wetted river length as a function of the river discharge, only shown for data points collected during river drying phases (more than 60 h after a peak flow).


5.2 Comparison with previous studies

Rupp et al. (2008) estimated transmission losses along the Selwyn River by performing river gauging manually at 18 cross sections on a limited number of days (4 to 60, depending on the cross section) between October 2003 and January 2007. The average transmission losses that they have estimated between the cross sections downstream of the gauging station used in our study (i.e. Scotts Road, Fig. 1) were mostly between 0.2 and 0.5 m3s-1km-1. This is in the lower range of our baseflow estimates. A more detailed comparison is difficult as our estimates differ in their spatial and temporal extents.

In a series of articles (Larned et al.2011, 2010, 2008; Rupp et al.2008), the ELFMOD model has been used to reconstruct the flow along the Selwyn River. Another output of the ELFMOD model is the flow permanence along the river, which was estimated to be between 20 % and 75 % in one of the driest reaches of the river, around 10 km downstream of Scotts Road (our gauging station). In this regard, our results differ significantly: our predicted wetted river length extends to the Hororata River confluence (15 km downstream of our gauging station, where the Selwyn River is gaining water again) only during peak flows (Fig. 15). Our reconstructed flow permanence curve (Fig. 16) indicates that the river was dry more than 90 % of the time 10 km downstream of Scotts Road during our study period. This discrepancy appears as well in the dataset used to train the models. Among the 153 drying front locations that we have identified on the satellite images and in the field between April 2020 and May 2021, no image shows the river flowing continuously to the Hororata River confluence. On the other hand, the data reported by Rupp et al. (2008), collected on 118 d between October 2003 and January 2007, show that when the river flow at Coes Ford (50 km downstream of our gauging station) was greater than twice the median, the entire Selwyn River was flowing.

The different results could be explained by the different approaches employed but more likely by the hydrological variability between the study periods. The period between March 2020 and May 2021 was particularly dry in the Canterbury region (NIWA2021, 2020). This led to low water levels and storage in the braidplain aquifer (Banks et al.2022) and limited the ability of this shallow aquifer to sustain the river flow as much as in a wetter year. Furthermore, a longer-term trend of decreasing low flow and wetted river length of the Selwyn River was highlighted by McKerchar and Schmidt (2007) and Rupp et al. (2008) for the period between 1984 and 2006. More research would be needed to investigate how the recent period studied in our work (2020–2021) falls within this longer-term trend. A long-term flow record has been available since 1964 at the Whitecliffs site (10 km upstream of Scotts Road) and could be used together with satellite images to investigate the Selwyn River transmission loss inter-annual variations. The Planet Monitoring library (Planet Team2017) used in the present study is only available from 2009 onwards, but other resources might be used to cover a longer time frame, although the resolution and frequency of available images in the more distant past will be lower. Moreover, the transmission loss estimates between Whitecliffs and Scotts Road would be more difficult to interpret because they would also include a constrained and gaining reach and thus a large spatial variability of transmission losses along the extended reach.

Comparing our results to the dataset including 73 reaches from 31 streams sourced from different studies by McMahon and Nathan (2021) indicates that the mean reach transmission losses per event predicted for the Selwyn River (0.42 GL/km) are much higher than the median of the dataset (0.046 GL/km) but lower than the 90th percentile (1.10 GL/km). In this regard, the Selwyn River transmission losses appear to be rather high. However, the transmission losses in the Selwyn River are still considerably lower than those estimated in large ephemeral rivers under an arid climate (e.g. Lange2005, reported a mean of 6.13 GL/km and Jarihani et al.2015, a mean of 6.79 GL/km). An important difference is that we have estimated the transmission losses, including the water lost at the drying front. This affected our largest loss estimates and the relationship between transmission losses and river stage and discharge. The only other application of the approach followed in this study was conducted by Walter et al. (2012) on a larger river but using only five satellite images. Their estimates ranged between 0.15 and 0.25 m3s-1km-1. This is lower than those estimated in this study for the Selwyn River and could be explained by the higher sediment permeability at our study site. Unfortunately, a comparison of the time dynamics of the estimated transmission losses and their relationship with the river stage and discharge is not possible because of the limited number of data points reported by Walter et al. (2012). More studies using this approach would be needed to investigate how this varies between ephemeral river systems. The increasing availability of satellite images should make that possible in the future.

Figure 15Wetted river length time series predicted using the random forest model ensemble.


Figure 16Longitudinal variation in flow permanence (proportion of the year with flowing water) downstream of our gauging station (Scotts Road = 0 km) predicted using the random forest model ensemble.


5.3 Uncertainty sources and propagation

In this study, we have carried out a comprehensive assessment of the different sources of uncertainty affecting our transmission loss estimation and prediction. Concerning the transmission loss estimates made using the satellite images, the uncertainties range from 30 % to 55 %. On the one hand, the uncertainties in the river discharge derived from the rating curve represent around 20 % (Appendix A). On the other hand, the uncertainties in the river drying front locations and wetted river lengths represent 10 % to 30 %, with increasing contributions for a smaller wetted river length. The estimation made using the field GPS points are less uncertain as the river drying front location was virtually exact. As a result, the uncertainties are around 30 % of the estimated values, around 20 % coming from the river discharge and 10 % from the wetted river lengths. For both methods, the uncertainties due to the river stage measurements are relatively low, below 4 %.

Regarding the transmission loss estimates derived from the differential gauging campaigns, the uncertainties vary between 5 % and 45 %, depending on the measurement uncertainties (between 2.7 % and 6.3 %) and the ratio between the transmission losses and the river discharge (Fig. 5). At low flow, the differences between individual flow measurements (i.e. transmission losses) are large compared to the measurement uncertainties, which lead to relatively small uncertainties in the transmission loss estimates. However, at high flow, the differences between individual flow measurements are small compared to the measurement uncertainties, and therefore the resulting uncertainties in the transmission loss estimates are high. Overall, we can state that quantifying the transmission losses from satellite imagery at our study site is not introducing much more uncertainty than using the traditional method of differential flow gauging.

Considering all our estimates used to train the random forest regressors, the propagated measurement uncertainties show an average value of 35 %. This is higher than the normalized root mean square of the random forest fitting errors (NRMSEs) calculated on the whole datasets (from 12 % to 26 %) and in the range of the NRMSEs calculated on the test datasets (from 16 % to 48 %). Moreover, the uncertainties in our estimated values are larger than the NRMSE calculated by comparing the average predicted and average estimated values (12 %). Therefore, we can state that our random forest ensemble is reproducing satisfactorily our transmission loss estimates, considering the measurement uncertainties.

5.4 Advantages and limitations of our approach and ways forward

Quantification of the transmission losses using the framework described in this article has many advantages over traditional methods but is also limited by our ability to identify the drying fronts on the satellite images and to predict the continuous hourly time series from the obtained data points.

The main advantage of our method is the reduced amount of fieldwork needed to produce high time resolution transmission loss estimates. Our framework only requires the installation and maintenance of a flow gauging station, which is common on many rivers. Another requirement is the availability of clear satellite imagery with a resolution higher than the river width. In our study, we used the Planet Monitoring collection (Planet Team2017), which is freely available to university-affiliated student and researchers through their Education and Research (E&R) Program. The 3.7 m resolution of these images was just enough to identify the river drying fronts, as the Selwyn River width is often less than 10 m. To apply the same approach to smaller rivers, other satellite resources exist (Maxar Team2022; Planet Team2017) and pre-processing of the satellite imagery could help (Callo2022). However, the time gap between two high-resolution images from other libraries is longer than the time gap between images from the Planet Monitoring collection. Another issue with the use of satellite images would be the presence of dense riparian vegetation or clouds, which could hinder our ability to identify the drying front on the images. In particular, clouds tend to obscure satellite images during higher flows, as they tend to occur during or shortly after rainfall events. However, in the future, we expect that more high-resolution and high-frequency satellite images will be available to researchers. This should make the approach presented in this article more attractive and feasible, even for smaller rivers. Furthermore, several algorithms have been developed to identify automatically water-covered areas from satellite images (Feyisa et al.2014; Munasinghe et al.2018; Sagin et al.2015). The difficulties described previously might complicate their utilization for our purpose, but an investigation of the possibilities could be beneficial to future applications, especially for longer time series.

Using random forest regressors enabled us to predict well the hourly transmission loss time series and their uncertainties without requiring much effort and computational resources. Thanks to our processing of the hydrograph to calculate the time since the last peak and the peak height for each transmission loss estimate, we could predict the transmission loss time series only using the river stage and flow time series. This provides us with a continuous hourly record of transmission losses, which is particularly useful for further work. First, this record was used within this study to investigate the exceedance probabilities and draw the duration curve. Second, the continuous transmission loss record can be used to evaluate physically based models. Third, there is some interest in predicting continuous records of both transmission losses and wetted river length for water management in this catchment. This is likely to be the case in other catchments as well. However, an important shortcoming of our modelling is that the predicted transmission losses during the highest flow peaks (end of June and early November 2020) are not higher than the predicted losses during lower peak flow events. This is due to the lack of data immediately after (<23h) these highest peaks. The random forest models are then unable to extrapolate prediction outside of the conditions they have been trained on. Many other kinds of statistical models and machine learning algorithms exist and have been applied in hydrology (Shen et al.2021; Solomatine and Ostfeld2008). Although some other machine learning algorithms could have some advantages over random forests, the issue with extrapolation is inherent to this kind of model, which lacks a representation of hydrological processes. They are therefore unlikely to give robust prediction of the response variable outside of the training conditions (e.g. for future scenario simulation). A more robust alternative could lie in physically based models. One of the main motivations behind this work is to use our estimated and predicted time series to evaluate different physically based models, which can then be used for simulation of future scenarios.

6 Conclusions

We presented a framework to estimate the transmission losses in ephemeral rivers from satellite imagery and to predict their continuous hourly time series using random forest models. This framework was successfully applied to the Selwyn River (Canterbury, New Zealand) for the period between March 2020 and May 2021. It is an efficient approach to quantify transmission losses in ephemeral rivers. The method has the advantage of requiring less fieldwork and generating more data than traditional methods like differential flow gauging at a similar accuracy. Our results show that the transmission losses in the Selwyn River downstream of our gauging station were between 0.25 and 0.65 m3s-1km-1 during most of the study period. However, shortly after peak flow, when the river was advancing and wetting the surrounding sediments (i.e. wetting phases), the losses could reach up to 1.55 m3s-1km-1. This compares quite well with previous estimates of transmission losses in the study area. However, we observed and predicted a much drier Selwyn River than reported in other studies. This is probably due to our study period being drier, but it is unclear how this relates to decadal trends. Furthermore, studying the relationship between the transmission losses and the river stage and discharge enabled us to improve our understanding of the Selwyn River interactions with groundwater. We believe that the generated transmission loss time series provide a valuable dataset to support further research efforts, especially the development of physically based models. Moreover, the presented framework has the potential to help water management in this catchment and beyond by providing an approach to simulate the transmission losses, groundwater recharge and wetted river length. Our framework is easily transferable to other ephemeral rivers and can be applied to longer time series. This could provide important information at a relatively low cost.

7 Future work

Some aspects of the groundwater–surface water interactions at our study site still need to be investigated in more detail. On the one hand, there is evidence of complex interactions, variable in space and time, between the Selwyn River and its braidplain aquifer. On the other hand, the infiltration from the braidplain to the deeper aquifer might be a simpler process, as suggested by the relatively stable losses estimated during drying phases in this study. Further research is needed to understand better these processes, their spatio-temporal variability and how they can be appropriately simulated. This is the focus of an ongoing research programme within which piezometers have been installed to monitor the water level and temperature in the shallow (braidplain) and deeper aquifers. In addition, active distributed temperature sensing surveys are being carried out to assess the small-scale variability of groundwater–surface water interactions at our study site (Banks et al.2022). Furthermore, we are developing physically based models of various complexities to represent the river–aquifer system and to enable us to get further insights into the system response and to simulate future scenarios.

Appendix A: Wetted river length determination

The wetted river length following the active river channel, the riverbed and the braidplain are presented in Fig. A1, using the satellite image taken on 27 January 2021 as an example.

Figure A1Wetted river lengths following the active river channel, the riverbed and the braidplain as considered in the study. The satellite image was taken on 27 January 2021, and the river drying front identified for this day is indicated in the image. Image credit to the Planet Team (2017).


Data availability

Data are available on request from the authors, except for satellite images that are owned by Planet Labs.

Author contributions

ADC and SW conceptualized the study and developed the methodology. ADC, SW and JK collected the data. ADC analysed the data and developed the modelling and visualization scripts. ADC wrote the manuscript, and all the authors reviewed it. SW and TW acquired the funding and guided the research.

Competing interests

The contact author has declared that none of the authors has any competing interests.


Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


We would like to thank the New Zealand Ministry of Business, Innovation and Employment for funding this research through the project “Subsurface processes in braided rivers – hyporheic exchange and leakage to groundwater”. We are also grateful to Planet Labs for granting us access to part of their satellite imagery collection and Environment Canterbury for letting us use their field data. We thank Aaron Dutton, who helped with the data collection in the field. Lastly, we would like to express our gratitude to Howard Wheater and an anonymous reviewer for their valuable comments, which helped us improve the quality of this article.

Financial support

This research has been supported by the Ministry of Business, Innovation and Employment (grant no. LVLX1901).

Review statement

This paper was edited by Efrat Morin and reviewed by two anonymous referees.


Anderson, E. I.: Modeling groundwater–surface water interactions using the Dupuit approximation, Adv. Water Resour., 28, 315–327,, 2005. a

Arscott, D. B., Larned, S., Scarsbrook, M. R., and Lambert, P.: Aquatic invertebrate community structure along an intermittence gradient: Selwyn River, New Zealand, J. N. Am. Benthol. Sc., 29, 530–545,, 2010. a

Banks, E. W., Morgan, L. K., Sai Louie, A. J., Dempsey, D., and Wilson, S. R.: Active distributed temperature sensing to assess surface water–groundwater interaction and river loss in braided river systems, J. Hydrol., 615, 128667,, 2022. a, b, c, d, e, f

Baudron, P., Alonso-Sarría, F., García-Aróstegui, J. L., Cánovas-García, F., Martínez-Vicente, D., and Moreno-Brotóns, J.: Identifying the origin of groundwater samples in a multi-layer aquifer system with Random Forest classification, J. Hydrol., 499, 303–315, 2013. a

Booker, D. J. and Woods, R. A.: Comparing and combining physically-based and empirically-based approaches for estimating the hydrology of ungauged catchments, J. Hydrol., 508, 227–239,, 2014. a

Breiman, L.: Random Forests, Mach. Learn., 45, 5–32,, 2001. a

Brown, L. J., Dravid, P. N., Hudson, N. A., and Taylor, C. B.: Sustainable groundwater resources, Heretaunga Plains, Hawke's Bay, New Zealand, Hydrogeol. J., 7, 440–453,, 1999. a

Burden, R. J.: Chemical zonation in groundwater of the Central Plains, Canterbury, J. Hydrol., 23, 100–119, 1984. a

Callo, J. A. R.: Estimating specific leakage rates to alluvial gravel aquifers using remote sensing data with a water balance – based approach in the Omaka and Taylor ephemeral rivers, New Zealand, Master thesis, Technische Universität Dresden, 2022. a

Coluccio, K. and Morgan, L. K.: A review of methods for measuring groundwater–surface water exchange in braided rivers, Hydrol. Earth Syst. Sci., 23, 4397–4417,, 2019. a

Cook, P. G.: Quantifying river gain and loss at regional scales, J. Hydrol., 531, 749–758,, 2015. a, b

Datry, T., Larned, S., and Scarsbrook, M. R.: Responses of hyporheic invertebrate assemblages to large-scale variation in flow permanence and surface–subsurface exchange, Freshwater Biol., 52, 1452–1462,, 2007. a

Datry, T., Larned, S. T., and Tockner, K.: Intermittent Rivers: A Challenge for Freshwater Ecology, BioScience, 64, 229–235,, 2014. a

Desai, S. and Ouarda, T. B. M. J.: Regional hydrological frequency analysis at ungauged sites with random forest regression, J. Hydrol., 594, 125861,, 2021. a

Di Baldassarre, G. and Montanari, A.: Uncertainty in river discharge observations: a quantitative analysis, Hydrol. Earth Syst. Sci., 13, 913–921,, 2009. a

Di Ciacca, A., Leterme, B., Laloy, E., Jacques, D., and Vanderborght, J.: Scale-dependent parameterization of groundwater–surface water interactions in a regional hydrogeological model, J. Hydrol., 576, 494–507,, 2019. a

Dravid, P. N. and Brown, L. J.: Heretaunga Plains groundwater study, Hawke's Bay Regional Council, (last access: August 2022), 1997. a

Fatichi, S., Vivoni, E. R., Ogden, F. L., Ivanov, V. Y., Mirus, B., Gochis, D., Downer, C. W., Camporese, M., Davison, J. H., and Ebel, B.: An overview of current applications, challenges, and future trends in distributed process-based models in hydrology, J. Hydrol., 537, 45–60, 2016. a

Feyisa, G. L., Meilby, H., Fensholt, R., and Proud, S. R.: Automated Water Extraction Index: A new technique for surface water mapping using Landsat imagery, Remote Sens. Environ., 140, 23–35,, 2014. a

Fleckenstein, J. H., Krause, S., Hannah, D. M., and Boano, F.: Groundwater-surface water interactions: New methods and models to improve understanding of processes and dynamics, Adv. Water Resour., 33, 1291–1295, 2010. a

Fovet, O., Belemtougri, A., Boithias, L., Braud, I., Charlier, J.-B., Cottet, M., Daudin, K., Dramais, G., Ducharne, A., Folton, N., Grippa, M., Hector, B., Kuppel, S., Le Coz, J., Legal, L., Martin, P., Moatar, F., Molénat, J., Probst, A., Riotte, J., Vidal, J.-P., Vinatier, F., and Datry, T.: Intermittent rivers and ephemeral streams: Perspectives for critical zone science and research on socio-ecosystems, WIREs Water, 8, e1523,, 2021. a

González-Pinzón, R., Ward, A. S., Hatch, C. E., Wlostowski, A. N., Singha, K., Gooseff, M. N., Haggerty, R., Harvey, J. W., Cirpka, O. A., and Brock, J. T.: A field comparison of multiple techniques to quantify groundwater–surface-water interactions, Freshw. Sci., 34, 139–160,, 2015. a

Harbaugh, A. W.: MODFLOW-2005, the U.S. Geological Survey modular ground-water model – the Ground-Water Flow Process. U.S. Geological Survey Techniques and Methods 6-A16, Tech. rep.,, 2005. a

Harbaugh, A. W., Banta, E. R., Hill, M. C., and McDonald, M. G.: MODFLOW-2000, the U.S. Geological Survey modular ground-water model user guide to modularization concepts and the ground-water flow process, Tech. rep., Denver, CO,, 2000. a

Hatch, C. E., Fisher, A. T., Revenaugh, J. S., Constantz, J., and Ruehl, C.: Quantifying surface water–groundwater interactions using time series analysis of streambed thermal records: Method development, Water Resour. Res., 42, W10410,, 2006. a

Hoehn, E. and Von Gunten, H. R.: Radon in groundwater: A tool to assess infiltration from surface waters to aquifers, Water Resour. Res., 25, 1795–1803,, 1989. a

James, G., Witten, D., Hastie, T., and Tibshirani, R.: An introduction to statistical learning, vol. 112, Springer, ISBN 978-1-4614-7138-7, 2013. a

Jarihani, A. A., Larsen, J. R., Callow, J. N., McVicar, T. R., and Johansen, K.: Where does all the water go? Partitioning water transmission losses in a data-sparse, multi-channel and low-gradient dryland river system using modelling and remote sensing, J. Hydrol., 529, 1511–1529,, 2015. a

Kalbus, E., Reinstorf, F., and Schirmer, M.: Measuring methods for groundwater – surface water interactions: a review, Hydrol. Earth Syst. Sci., 10, 873–887,, 2006. a

Knoll, L., Breuer, L., and Bach, M.: Large scale prediction of groundwater nitrate concentrations from spatial data using machine learning, Sci. Total Environ., 668, 1317–1327, 2019. a

Koch, J., Stisen, S., Refsgaard, J. C., Ernstsen, V., Jakobsen, P. R., and Højberg, A. L.: Modeling depth of the redox interface at high resolution at national scale using random forest and residual gaussian simulation, Water Resour. Res., 55, 1451–1469, 2019. a

Kuffour, B. N. O., Engdahl, N. B., Woodward, C. S., Condon, L. E., Kollet, S., and Maxwell, R. M.: Simulating coupled surface–subsurface flows with ParFlow v3.5.0: capabilities, applications, and ongoing development of an open-source, massively parallel, integrated hydrologic model, Geosci. Model Dev., 13, 1373–1397,, 2020. a

Kuhn, M. and Wickham, H.: Tidymodels: a collection of packages for modeling and machine learning using tidyverse principles., (last access: August 2022), 2020. a

Lange, J.: Dynamics of transmission losses in a large arid stream channel, J. Hydrol., 306, 112–126,, 2005. a

Langevin, C. D., Hughes, J. D., Banta, E. R., Niswonger, R. G., Panday, S., and Provost, A. M.: Documentation for the MODFLOW 6 groundwater flow model, Tech. rep.,, 2017. a

Larned, S. T., Datry, T., and Robinson, C. T.: Invertebrate and microbial responses to inundation in an ephemeral river reach in New Zealand: effects of preceding dry periods, Aquatic Sci., 69, 554–567,, 2007. a

Larned, S. T., Hicks, D. M., Schmidt, J., Davey, A. J. H., Dey, K., Scarsbrook, M., Arscott, D. B., and Woods, R. A.: The Selwyn River of New Zealand: a benchmark system for alluvial plain rivers, River Res. Appl., 24, 1–21,, 2008. a, b, c, d, e, f

Larned, S. T., Arscott, D. B., Schmidt, J., and Diettrich, J. C.: A Framework for Analyzing Longitudinal and Temporal Variation in River Flow and Developing Flow-Ecology Relationships1, J. Am. Water Resour. As., 46, 541–553,, 2010. a, b

Larned, S. T., Schmidt, J., Datry, T., Konrad, C. P., Dumas, J. K., and Diettrich, J. C.: Longitudinal river ecohydrology: flow variation down the lengths of alluvial rivers, Ecohydrology, 4, 532–548,, 2011. a, b, c

Le Lay, H., Thomas, Z., Rouault, F., Pichelin, P., and Moatar, F.: Characterization of Diffuse Groundwater Inflows into Stream Water (Part II: Quantifying Groundwater Inflows by Coupling FO-DTS and Vertical Flow Velocities), Water, 11, 2430,, 2019. a

Lee, D. R.: A device for measuring seepage flux in lakes and estuaries, Limnol. Oceanogr., 22, 140–147,, 1977. a

Lee, D. R. and Cherry, J. A.: A Field Exercise on Groundwater Flow Using Seepage Meters and Mini-piezometers, J. Geol. Educ., 27, 6–10,, 1979. a

Lewandowski, J., Meinikmann, K., and Krause, S.: Groundwater-surface water interactions: Recent advances and interdisciplinary challenges, Water, 12, 296,, 2020. a

Massmann, G., Sültenfuß, J., and Pekdeger, A.: Analysis of long-term dispersion in a river-recharged aquifer using tritium/helium data, Water Resour. Res., 45,, 2009. a

Maxar Team: Maxar – Archive search and discovery,, last access: August 2022. a

Maxwell, R. M., Kollet, S. J., Smith, S. G., Woodward, C. S., Falgout, R. D., Ferguson, I. M., Baldwin, C., Bosl, W. J., Hornung, R., and Ashby, S.: ParFlow user's manual, International Ground Water Modeling Center Report GWMI, 1, 129, (last access: August 2022), 2009. a

McDonald, A. K., Sheng, Z., Hart, C. R., and Wilcox, B. P.: Studies of a regulated dryland river: surface–groundwater interactions, Hydrol. Process., 27, 1819–1828,, 2013. a

McDonald, M. G. and Harbaugh, A. W.: A modular three-dimensional finite-difference ground-water flow model, vol. 6, US Geological Survey Reston, VA,, 1988. a

McKerchar, A. I. and Schmidt, J.: Decreases in low flows in the lower Selwyn River?, J. Hydrol., 46, 63–72, 2007. a

McMahon, T. A. and Nathan, R. J.: Baseflow and transmission loss: A review, WIREs Water, 8, 8:e1527,, 2021. a, b, c

McMahon, T. A. and Peel, M. C.: Uncertainty in stage–discharge rating curves: application to Australian Hydrologic Reference Stations data, Hydrolog. Sci. J., 64, 255–275,, 2019. a

McMillan, H., Krueger, T., and Freer, J.: Benchmarking observational uncertainties for hydrology: rainfall, river discharge and water quality, Hydrol. Process., 26, 4078–4111,, 2012. a

Morel-Seytoux, H. J., Miller, C. D., Mehl, S., and Miracapillo, C.: Achilles' heel of integrated hydrologic models: The stream-aquifer flow exchange, and proposed alternative, J. Hydrol., 564, 900–908,, 2018. a

Munasinghe, D., Cohen, S., Huang, Y.-F., Tsang, Y.-P., Zhang, J., and Fang, Z.: Intercomparison of Satellite Remote Sensing-Based Flood Inundation Mapping Techniques, J. Am. Water Resour. As., 54, 834–846,, 2018. a

NIWA: Annual climate summary 2020, Tech. rep., (last access: August 2022), 2020. a

NIWA: Annual climate summary 2021, (last access: August 2022), 2021. a

Papacharalampous, G. A. and Tyralis, H.: Evaluation of random forests and Prophet for daily streamflow forecasting, Adv. Geosci., 45, 201–208,, 2018. a

Planet Team: Planet Application Program Interface: In Space for Life on Earth. San Francisco, CA., (last access: August 2022), 2017. a, b, c, d

Popp, A. L., Pardo-Álvarez, Á., Schilling, O. S., Scheidegger, A., Musy, S., Peel, M., Brunner, P., Purtschert, R., Hunkeler, D., and Kipfer, R.: A Framework for Untangling Transient Groundwater Mixing and Travel Times, Water Resour. Res., 57, e2020WR028362,, 2021. a

Prickett, T. A. and Lonnquist, C. G.: Selected digital computer techniques for groundwater resource evaluation, Tech. rep., (last access: August 2022), 1971. a

Rodriguez-Galiano, V., Mendes, M. P., Garcia-Soldado, M. J., Chica-Olmo, M., and Ribeiro, L.: Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: A case study in an agricultural setting (Southern Spain), Sci. Total Environ., 476, 189–206, 2014. a

Rosen, M. R. and White, P. A.: Groundwaters of New Zealand., New Zealand Hydrological Society, ISBN 13 978-0-473-07816-4, 2001. a

Rosenberry, D. O., Duque, C., and Lee, D. R.: History and evolution of seepage meters for quantifying flow between groundwater and surface water: Part 1–Freshwater settings, Earth-Sci. Rev., 204, 103167,, 2020. a

Rupp, D. E., Larned, S. T., Arscott, D. B., and Schmidt, J.: Reconstruction of a daily flow record along a hydrologically complex alluvial river, J. Hydrol., 359, 88–104,, 2008. a, b, c, d, e, f, g

Rushton, K.: Representation in regional models of saturated river–aquifer interaction for gaining/losing rivers, J. Hydrol., 334, 262–281,, 2007. a

Rushton, K. R. and Tomlinson, L. M.: Possible mechanisms for leakage between aquifers and rivers, J. Hydrol., 40, 49–65, 1979. a

Sagin, J., Sizo, A., Wheater, H., Jardine, T. D., and Lindenschmidt, K.-E.: A water coverage extraction approach to track inundation in the Saskatchewan River Delta, Canada, Int. J. Remote Sens., 36, 764–781,, 2015. a

Schaper, J. L., Zarfl, C., Meinikmann, K., Banks, E. W., Baron, S., Cirpka, O. A., and Lewandowski, J.: Spatial Variability of Radon Production Rates in an Alluvial Aquifer Affects Travel Time Estimates of Groundwater Originating From a Losing Stream, Water Resour. Res., 58, e2021WR030635,, 2022. a

Shanafield, M. and Cook, P. G.: Transmission losses, infiltration and groundwater recharge through ephemeral and intermittent streambeds: A review of applied methods, J. Hydrol., 511, 518–529,, 2014. a, b

Shanafield, M., Bourke, S. A., Zimmer, M. A., and Costigan, K. H.: An overview of the hydrology of non-perennial rivers and streams, WIREs Water, 8, e1504,, 2021. a

Shen, C., Chen, X., and Laloy, E.: Editorial: Broadening the Use of Machine Learning in Hydrology, Front. Water, 3,, 2021. a, b

Smith, W. and Montgomery, H.: Revolution or evolution? New Zealand agriculture since 1984, GeoJ., 59, 107–118,, 2004. a

Solomatine, D. P. and Ostfeld, A.: Data-driven modelling: some past experiences and new approaches, J. Hydroinform., 10, 3–22,, 2008. a, b

Taylor, C. B., Wilson, D. D., Brown, L. J., Stewart, M. K., Burden, R. J., and Brailsford, G. W.: Sources and flow of north Canterbury plains groundwater, New Zealand, J. Hydrol., 106, 311–340,, 1989. a, b

Therrien, R., McLaren, R. G., Sudicky, E. A., and Panday, S. M.: HydroGeoSphere: A three-dimensional numerical model describing fully-integrated subsurface and surface flow and solute transport. Groundwater Simulations Group, University of Waterloo, Waterloo, ON, 830, 2010. a

Tran, H., Leonarduzzi, E., Fuente, L., Hull, R., Bansal, V., Chennault, C., Gentine, P., Melchior, P., Condon, L., and Maxwell, R.: Development of a Deep Learning Emulator for a Distributed Groundwater–Surface Water Model: ParFlow-ML, Water, 13, 3393,, 2021. a

Tyralis, H., Papacharalampous, G., and Langousis, A.: A Brief Review of Random Forests for Water Scientists and Practitioners and Their Recent History in Water Resources, Water, 11, 910,, 2019. a

Villeneuve, S., Cook, P. G., Shanafield, M., Wood, C., and White, N.: Groundwater recharge via infiltration through an ephemeral riverbed, central Australia, J. Arid. Environ., 117, 47–58,, 2015. a

Vincent, C. N.: Hydrogeology of the Upper Selwyn Catchment, Master thesis,, 2005. a, b

Walter, G. R., Necsoiu, M., and McGinnis, R.: Estimating Aquifer Channel Recharge Using Optical Data Interpretation, Groundwater, 50, 68–76,, 2012. a, b, c, d, e

Walters, M. O.: Transmission Losses in Arid Region, J. Hydraul. Eng., 116, 129–138,, 1990. a

Wang, P., Pozdniakov, S. P., and Vasilevskiy, P. Y.: Estimating groundwater-ephemeral stream exchange in hyper-arid environments: Field experiments and numerical simulations, J. Hydrol., 555, 68–79,, 2017. a

Wheater, H. S., Mathias, S. A., and Li, X.: Groundwater modelling in arid and semi-arid areas, Cambridge University Press,, 2010. a

White, P. A.: Avon River springs catchment, Christchurch City, New Zealand, Aust. J. Earth Sci., 56, 61–70,, 2009.  a

White, P. A., Kovacova, E., Zemansky, G., Jebbour, N., and Moreau-Fournier, M.: Groundwater-surface water interaction in the Waimakariri River, New Zealand, and groundwater outflow from the river bed, J. Hydrol., 51, 1–23, 2012. a

Wilson, D. D.: The significance of geology in some current water resource problems, Canterbury Plains, New Zealand, J. Hydrol., 12, 103–118, 1973. a

Wilson, S. R., Close, M. E., Abraham, P., Sarris, T. S., Banasiak, L., Stenger, R., and Hadfield, J.: Achieving unbiased predictions of national-scale groundwater redox conditions via data oversampling and statistical learning, Sci. Total Environ., 705, 135877,, 2020. a

Wöhling, T., Gosses, M. J., Wilson, S. R., and Davidson, P.: Quantifying River‐Groundwater Interactions of New Zealand's Gravel‐Bed Rivers: The Wairau Plain, Groundwater, 56, 647–666, 2018. a

Wöhling, T., Wilson, S., Wadsworth, V., and Davidson, P.: Detecting the cause of change using uncertain data: Natural and anthropogenic factors contributing to declining groundwater levels and flows of the Wairau Plain aquifer, New Zealand, J. Hydrol., 31, 100715,, 2020. a

Wright, M. N. and Ziegler, A.: ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R, J. Stat. Softw., 77, 1–17,, 2017. a, b

Short summary
We present a novel framework to estimate how much water is lost by ephemeral rivers using satellite imagery and machine learning. This framework proved to be an efficient approach, requiring less fieldwork and generating more data than traditional methods, at a similar accuracy. Furthermore, applying this framework improved our understanding of the water transfer at our study site. Our framework is easily transferable to other ephemeral rivers and could be applied to long time series.