GRAINet: mapping grain size distributions in river beds  from UAV images with convolutional neural networks

Lang, Nico; Irniger, Andrea; Rozniak, Agnieszka; Hunziker, Roni; Wegner, Jan Dirk; Schindler, Konrad

doi:https://doi.org/10.5194/hess-25-2567-2021

Articles | Volume 25, issue 5

https://doi.org/10.5194/hess-25-2567-2021

Articles | Volume 25, issue 5

Research article

19 May 2021

Research article |

| 19 May 2021

GRAINet: mapping grain size distributions in river beds from UAV images with convolutional neural networks

Nico Lang, Andrea Irniger, Agnieszka Rozniak, Roni Hunziker, Jan Dirk Wegner, and Konrad Schindler

Abstract

Grain size analysis is the key to understand the sediment dynamics of river systems. We propose GRAINet, a data-driven approach to analyze grain size distributions of entire gravel bars based on georeferenced UAV images. A convolutional neural network is trained to regress grain size distributions as well as the characteristic mean diameter from raw images. GRAINet allows for the holistic analysis of entire gravel bars, resulting in (i) high-resolution estimates and maps of the spatial grain size distribution at large scale and (ii) robust grading curves for entire gravel bars. To collect an extensive training dataset of 1491 samples, we introduce digital line sampling as a new annotation strategy. Our evaluation on 25 gravel bars along six different rivers in Switzerland yields high accuracy: the resulting maps of mean diameters have a mean absolute error (MAE) of 1.1 cm, with no bias. Robust grading curves for entire gravel bars can be extracted if representative training data are available. At the gravel bar level the MAE of the predicted mean diameter is even reduced to 0.3 cm, for bars with mean diameters ranging from 1.3 to 29.3 cm. Extensive experiments were carried out to study the quality of the digital line samples, the generalization capability of GRAINet to new locations, the model performance with respect to human labeling noise, the limitations of the current model, and the potential of GRAINet to analyze images with low resolutions.

Download & links

How to cite.

Received: 27 Apr 2020 – Discussion started: 25 May 2020 – Revised: 05 Feb 2021 – Accepted: 25 Mar 2021 – Published: 19 May 2021

1 Introduction

Understanding the hydrological and geomorphological processes of rivers is crucial for their sustainable development so as to mitigate the risk of extreme flood events and to preserve the biodiversity in aquatic habitats. Grain size data of gravel- and cobble-bed streams are key to advance the understanding and modeling of such processes (Bunte and Abt, 2001). The fluvial morphology of the majority of the world's streams is heavily affected by human activity and construction along the river (Grill et al., 2019). Human interventions like gravel extractions, sediment retention basins in the upper catchments, hydroelectric power plants, dams, or channels reduce the bed load and lead to surface armoring, clogging of the bed, and latent erosion (Surian and Rinaldi, 2003; Simon and Rinaldi, 2006; Poeppl et al., 2017; Gregory, 2019). Consequently, the natural alteration of the river bed is hindered, eventually deteriorating habitats and potential spawning grounds. Moreover, the process of bed-load transport can cause bed or bank erosion, the destruction of engineering structures (e.g., due to bridge scours), or increased flooding due to deposits in the channel that amplify the impact of severe floods (Badoux et al., 2014). What makes modeling of fluvial morphology challenging are the mutual dependencies between the flow field, grain size, movement, and geometry of the channel bed and banks. While channel shape and roughness define the flow field, the flow moves sediments – depending on their size – and the bed is altered by erosion and deposition. This mutually reinforcing system makes understanding channel form and processes hard. Transport calculations in numerical models are thus still based on empirical formulas (Nelson et al., 2016).

One important key indicator for modeling sediment dynamics of a river system is the grading curve of the sediment. Depending on the complexity of the model, the grain size distribution is either described by its characteristic diameters (e.g., the mean diameter d_m defined by Meyer-Peter and Müller, 1948) or by the fractions of the grading curve (fractional transport; Habersack et al., 2011). The grain size of the river bed is crucial because it defines the roughness of the channel as well as the incipient motion of the sediment (Bunte and Abt, 2001). Thus, knowledge of the grain size distribution is essential to specify flood protection measures, to assess bed stability, to classify aquatic habitats, and to evaluate geological deposits (Habersack et al., 2011). Collecting the required calibration data to describe the composition of a river bed is time-consuming and costly, since it varies strongly along a river (Surian, 2002; Bunte and Abt, 2001) and even locally within individual gravel bars (Babej et al., 2016; Rice and Church, 2010). Traditional mechanical sieving to classify sediments (Krumbein and Pettijohn, 1938; Bunte and Abt, 2001) requires a substantial amount of skilled labor, and the whole process of digging, transport, and sieving is time-consuming, costly, and destructive. Consequently, it is rarely implemented in practice. An alternative way of sampling sediment is surface sampling along transects or on regular grid. We refer to Bunte and Abt (2001) for a detailed overview of traditional sampling strategies. A simplified, efficient approach that collects sparse data samples in the field is the line sampling analysis of Fehr (1987), the quasi-gold standard in practice today.¹ This procedure of surface sampling is commonly referred to as pebble counts along transects (Bunte and Abt, 2001). Yet, this approach is still very time-consuming and, worse, potentially inaccurate and subjective (Bunte and Abt, 2001; Detert and Weitbrecht, 2012). Moreover, in situ data collection requires physical access and cannot adequately sample inaccessible parts of the bed, such as gravel bar islands (Bunte and Abt, 2001).

An obvious idea to accelerate data acquisition is to estimate grain size distribution from images. So-called photo-sieving methods that manually measure gravel sizes from ground-level images (Adams, 1979; Ibbeken and Schleyer, 1986) were first proposed in the late 1970s. While the accuracy of measuring the size of individual grains may be compromised compared to field sampling, manual image-based sampling brings many advantages in terms of transparency, reproducibility, and efficiency. Since it is nondestructive, multiple operators can label the exact same location. Much research tried to automatically estimate grain size distributions from ground-level images (Butler et al., 2001; Rubin, 2004; Graham et al., 2005; Verdú et al., 2005; Detert and Weitbrecht, 2012; Buscombe, 2013; Spada et al., 2018; Buscombe, 2019; Purinton and Bookhagen, 2019). On the contrary, relatively little research has addressed the automatic mapping of grain sizes from images at larger scale (Carbonneau et al., 2004, 2005; Black et al., 2014; de Haas et al., 2014; Carbonneau et al., 2018; Woodget et al., 2018; Zettler-Mann and Fonstad, 2020), which is needed for practical impact. Monitoring of river systems over time suffers from biases introduced by different operators in the field (Wohl et al., 1996). Hence, objective, automatic methods for large-scale grain size analysis offer great potential for consistent monitoring over time.

Other researchers have proposed to analyze 3D data acquired with terrestrial or airborne lidar or through photogrammetric stereo matching (Brasington et al., 2012; Vázquez-Tarrío et al., 2017; Wu et al., 2018; Huang et al., 2018). However, working with 3D data introduces much more overhead in data processing compared to 2D imagery. Moreover, terrestrial data acquisition lacks flexibility and scalability, while airborne lidar remains costly (at least until it can be recorded with consumer-grade UAVs). Photogrammetric 3D reconstruction is limited by the reduced resolution of the reconstructed point clouds (relative to that of the original images), which suppresses smaller grains. Woodget et al. (2018) have shown that, for small grain sizes, image-based texture analysis is beneficial over roughness-based methods.

While automatic grain size estimation from ground-level images is more efficient than traditional field measurements (Wolman, 1954; Fehr, 1987; Bunte and Abt, 2001), it is commonly less accurate, and scaling to large regions is hard. Threshold-based image analysis for explicit gravel detection and measurements is affected by lighting variations and thus requires much manual parameter tuning. In contrast, statistical approaches avoid explicit detection of grains and empirically correlate image content with the grain size measurement. Although these data-driven approaches are promising, their predictive accuracy and generalization to new scenes (e.g., airborne imagery at country scale) is currently limited by manually designed features and small training datasets.

https://hess.copernicus.org/articles/25/2567/2021/hess-25-2567-2021-f01

Figure 1Illustration of the two final products generated with GRAINet on the river Rhone. Left: map of the spatial distribution of characteristic grain sizes (here d_m). Right: grading curve for the entire gravel bar population, by averaging the predicted curves of individual line samples.

In this paper, we propose a novel approach based on convolutional neural networks (CNNs) that efficiently maps grain size distributions over entire gravel bars, using georeferenced and orthorectified images acquired with a low-cost UAV. This not only allows our generic approach to estimate the full grain size distribution at each location in the orthophoto but also to estimate characteristic grain sizes directly using the same model architecture (Fig. 1). Since it is hard to collect sufficiently large amounts of labeled training data for hydrological tasks (Shen et al., 2018), we introduce digital line sampling as a new, efficient annotation strategy. Our CNN avoids explicit detection of individual objects (grains) and predicts the grain size distribution or derived variables directly from the raw images. This strategy is robust against partial object occlusions and allows for accurate predictions even with coarse image resolution, where the individual small grains are not visible by the naked eye. A common characteristic of most research in this domain is that grain size is estimated in pixels (Carbonneau et al., 2018). Typically, the image scale is determined by recording a scale bar in each image, which is used to convert the grain size into metric units (e.g., Detert and Weitbrecht, 2012) but limits large-scale application. In contrast, our approach estimates grain sizes directly in metric units from orthorectified and georeferenced UAV images.²

We evaluate the performance of our method and its robustness to new, unseen locations with different imaging conditions (e.g., weather, lighting, shadows) and environmental factors (e.g., wet grains, algae covering) through cross-validation on a set of 25 gravel bars (Irniger and Hunziker, 2020). Like Shen et al. (2018), we see great potential of deep learning techniques in hydrology, and we hope that our research constitutes a further step towards its widespread adoption. To summarize, our presented approach includes the following contributions:

end-to-end estimation of the full grain size distribution at particular locations in the orthophoto, over areas of 1.25 m×0.5 m;
robust mapping of grain size distribution over entire gravel bars;
generic approach to map characteristic grain sizes with the same model architecture;
mapping of mean diameters d_m below 1.5 cm;
robust estimation of d_m, for arbitrary ground sampling distances up to 2 cm.

2 Related work

In this section, we review related work on automated grain size estimation from images. We refer the reader to Piégay et al. (2019) for a comprehensive overview of remote sensing approaches on rivers and fluvial geomorphology. Previous research can be classified into traditional image processing and statistical approaches.

Traditional image processing, also referred to as object-based approaches (e.g., Carbonneau et al., 2018), has been applied to segment individual grains and measure their sizes, by fitting an ellipse and reporting the length of its minor axis as the grain size (Butler et al., 2001; Sime and Ferguson, 2003; Graham et al., 2005, 2010; Detert and Weitbrecht, 2012; Purinton and Bookhagen, 2019). Detert and Weitbrecht (2012) presented BASEGRAIN, a MATLAB-based object detection software tool for granulometric analysis of ground-level top-view images of fluvial, noncohesive gravel beds. The gravel segmentation process includes grayscale thresholding, edge detection, and a watershed transformation. Despite this automated image analysis, extensive manual parameter tuning is often necessary, which hinders the automatic application to large and diverse sets of images. Recently Purinton and Bookhagen (2019) introduced a python tool called PebbleCounts as a successor of BASEGRAIN, replacing the watershed approach with k-means clustering.

Statistical approaches aim to overcome limitations of object-centered approaches by relying on global image statistics. Image texture (Carbonneau et al., 2004; Verdú et al., 2005), autocorrelation (Rubin, 2004; Buscombe and Masselink, 2009), wavelet transformations (Buscombe, 2013), or 2D spectral decomposition (Buscombe et al., 2010) are used to estimate the characteristic grain sizes like the mean (d_m) and median (d₅₀) grain diameters. Alternatively, one can regress specific percentiles of the grading curve individually (Black et al., 2014; Buscombe, 2013, 2019).

Buscombe (2019) proposed a framework called SediNet, based on CNNs, to estimate grain sizes as well as shapes from images. Overall, the used dataset of 409 manually labeled sediment images was halved into training and test portions, and CNNs were trained from scratch, despite the small amount of data.³

In contrast to previous work, we view the frequency or volume distribution of grain sizes as a probability distribution (of sampling a certain size), and we fit our model by minimizing the discrepancy between the predicted and ground truth distributions. Our method is inspired by Sharma et al. (2020), who proposed HistoNet to count objects in images (soldier fly larvae and cancer cells) and to predict absolute size distributions of these objects directly, without any explicit object detection. The authors show that end-to-end estimation of object size distributions outperforms baselines using explicit object segmentation (in their case with Mask-RCNN; He et al., 2017). Even though Sharma et al. (2020) avoid explicit instance segmentation, the training process is supervised with a so-called count map derived from a pixel-accurate object mask, which indicates object sizes and locations in the image. In contrast, our approach requires neither a pixel-accurate object mask nor a count map for training, which are both laborious to annotate manually. Instead, the CNN is trained by simply regressing the grain size distribution end-to-end. Labeling of new training data becomes much more efficient, because we no longer need to acquire pixel-accurate object labels. Our model learns to estimate object size frequencies by looking at large image patches, without access to explicit object counts or locations.

3 Data

We collected a dataset of 1491 digitized line samples acquired from a total of 25 different gravel bars on six Swiss rivers (see Table B1 in Appendix B for further details). We name gravel bar locations with the river name and the distance from the river mouth in kilometers.⁴ All gravel bars are located on the northern side of the Alps, except for two sites at the river Rhone (Fig. 2). All investigated rivers are gravel rivers with gradients of 0.01 %–1.5 %, with the majority (20 sites) having gradients <1.0 %. The river width at the investigated sites varies between 50 and 110 m, whereby Emme km 005.5 and Emme km 006.5 correspond to the narrowest sites, and Reuss km 017.2 represents the widest one.

https://hess.copernicus.org/articles/25/2567/2021/hess-25-2567-2021-f02

Figure 2Overview map with the 25 ground truth locations of the investigated gravel bars in Switzerland.

One example image tile from each of the 25 sites is shown in Fig. 3. This collection qualitatively highlights the great variety of grain sizes, distributions, and lighting conditions (e.g., shadows, hard and soft light due to different weather conditions). The total number of digital line samples collected per site varies between 4 (Reuss km 021.4) and 212 (Kl. Emme km 030.3), depending on the spatial extent and the variability of grain sizes within the gravel bar.

https://hess.copernicus.org/articles/25/2567/2021/hess-25-2567-2021-f03

Figure 3Example image tiles (1.25 m×0.5 m) with 0.25 cm ground sampling distance. Each of the 25 example tiles is taken from a different gravel bar.

GRAINet: mapping grain size distributions in river beds from UAV images with convolutional neural networks

3.1 UAV imagery

3.2 Annotation strategy

3.3 Ground truth

4.1 Image preprocessing

4.2 Regression of grain size distributions with GRAINet

4.2.1 CNN output targets

4.2.2 Model learning

4.3 Loss functions and error metrics

4.4 Evaluation strategy

4.4.1 Tenfold cross-validation

4.4.2 Geographical cross-validation

4.4.3 Comparison to human performance

4.5 Final products

4.6 Experimental setup

5.1 Quality of ground truth data

5.1.1 Comparison to field measurements

5.1.2 Label uncertainty from repeated annotations

5.2 Estimation of grain size distributions

5.2.1 Regressing the relative frequency distribution

5.2.2 Regressing the relative volume distribution

5.2.3 Performance depending on the GRAINet regression target

5.2.4 Learned global texture features

5.2.5 Grading curves for entire gravel bars

5.3 Estimation of characteristic grain sizes

5.3.1 Regressing the mean diameter dm

5.3.2 Performance for different regression targets

5.3.3 Mean dm for entire gravel bars

5.3.4 Comparison to human performance

5.3.5 High-resolution grain size maps

5.4 Generalization across gravel bars

5.4.1 Grading curves

5.4.2 Mean diameter dm

5.5 Effect of the image resolution

6.1 Manual component of the presented approach

6.2 Geographical generalization

6.3 Comparison to previous work

6.4 Advantages and limitations of the approach

6.5 Potential applications

5.3.1 Regressing the mean diameter d_m

5.3.3 Mean d_m for entire gravel bars

5.4.2 Mean diameter d_m