Flow velocity and discharge measurement in rivers using terrestrial and UAV imagery

An automatic workflow to measure surface flow velocities in rivers is introduced, including a Python tool. The method is based on PTV and comprises an automatic definition of the search area for particles to track. Tracking is performed in the original images. Only the final tracks are geo-referenced, intersecting the image observations with water surface in object space. Detected particles and corresponding feature tracks are filtered considering particle and flow characteristics to mitigate the impact of sun glare and outliers. The method can be applied to different perspectives, including terrestrial and aerial (i.e. 5 UAV) imagery. To account for camera movements images can be co-registered in an automatic approach. In addition to velocity estimates, discharge is calculated using the surface velocities and wetted cross-section derived from surface models computed with structure-from-motion and multi-media photogrammetry. The workflow is tested at two river reaches (paved and natural) in Germany. Reference data is provided by ADCP measurements. At the paved river reach highest deviations of flow velocity and discharge reach 5% and 4%, respectively. At the natural river deviations are larger (up to 31%) due to the irregular cross10 section shapes hindering accurate contrasting of ADCPand image-based results. The provided tool enables the measurement of surface flow velocities independently of the perspective from which images are acquired. With the contact-less measurement spatially distributed velocity fields can be estimated and river discharge in previously ungauged and unmeasured regions can be calculated.


Data acquisition
Different data was collected during the field campaigns at both river sections. Amongst others ADCP measurements were performed as flow velocity reference, GCPs were defined to geo-reference the video data and UAV and terrestrial imagery were acquired to perform image-based flow velocity estimation. 25

Reference data
For the ADCP measurements the moving boat approach with StreamPro from RDI is used. Velocity profiles were measured with a blanking range of 14 cm near the water surface. Data were processed using the AGILA software from the German Federal Institute of Hydrology (BfG). Measurements along the boat track were projected onto a reference cross sectional area.
Afterwards surface flow velocities were extrapolated. At the Wesenitz ADCP measurements were performed at one cross- The spatial variation of flow velocities is larger at the Freiberger Mulde, where measurements were performed in a natural river 5 reach, which is in contrast to the flow velocity range at the Wesenitz, where data was captured at a standardized gauge station.
Thus, only one profile was measured at the Wesenitz. The discharge at the Freiberger Mulde is 5.88 m³/s on average but reveals a standard deviation of 0.25 m³/s. Estimated discharge of the river reach therefore reveals a variation of about 4%, which can be attributed to inconsistencies during data acquisition. A decrease of the velocity coefficient, which has been derived from the ADCP measurements, with decreasing water depth is observed. At the Freiberger Mulde cross-sections 1 and 3 (Table 1) have 10 lower water depth compared to profile 2. Thus, the velocity coefficients are lower.

Image-based data
At both river reaches video sequences were acquired with terrestrial cameras and with a camera installed at the UAV Asctec Falcon 8. The airborne image data was captured at flying heights of about 20 m and 30 m at the Wesenitz and Freiberger Mulde, respectively. Videos were captured with a frame rate of 25 frames per second (fps) and with a resolution of 1920 x 1080 pixels 15 using the camera Sony NEX-5N with a fixed lens with a focal length of 16 mm. The ground sampling distance (GSD) is about 7 mm at the Wesenitz and about 9 mm at the Freiberger Mulde.
The terrestrial cameras were installed at bridges across the river ( fig. 1a). At the Wesenitz three cameras were installed to evaluate the performance of different cameras ( fig. 1b). Two Canon EOS 1200D and one Canon EOS 500D were setup. The 1200D cameras captured video sequences with 25 fps and with a resolution of 1920 x 1080 pixels. The 500D captured frames 20 with a higher rate (30 fps) and smaller image resolution (1280 x 720 pixels). All three cameras were facing downstream. At the Freiberger Mulde the camera Casio EX-F1, equipped with a zoom lens fixed to 7.5 mm, was used. Videos were captured with 30 fps and a resolution of 640 x 480 pixels. The camera was facing upstream.
The terrestrial cameras were calibrated for both rivers to allow for the correction of image distortion impacts. To estimate the interior geometry of the cameras, images of an in-house calibration field have been captured in a specific calibration pattern 25 (Luhmann et al., 2014). These images, together with approximate coordinates of the calibration field and approximations of the interior camera orientations were used in a free-network bundle adjustment within Aicon 3D Studio to calibrate each camera.
More details regarding the workflow are given in Eltner et al. (2015).

High resolution topography of the river reaches
Local 3D models describing the topography of the river reaches are necessary to scale the image measurements. Therefore, 30 high resolution topography data was acquired at both rivers using SfM photogrammetry (Eltner et al., 2016, James et al., 2019. SfM in combination with multi-view stereo matching (MVS) allows for the digital reconstruction of the topography from overlapping images and some GCPs. Thereby, homologous image points in overlapping images are detected and matched automatically. From these homologous points and some assumptions about the interior camera model, the position and orientation of each captured image (i.e. camera pose) can be calculated. With known network geometry, a dense point cloud can be computed, reconstructing the 3D information for almost each image pixel. The resulting 3D models are geo-referenced during the reconstruction or afterwards via GCPs.
At the Wesenitz the 3D surface model of the river reach was calculated from 85 terrestrially captured images with a Canon 5 EOS 600D (20 mm fixe lens) and from 20 UAV images (Eltner et al., 2018). The SfM calculations were performed in Agisoft Metashape. At the Freiberger Mulde seven frames of the video sequence, which is also used for later PTV processing, were utilized to perform SfM photogrammetry to retrieve the corresponding 3D model of the river reach. GCPs made of white circles on a black background were installed in order to reference the 3D data as well as the image-based velocity measurements. They were measured with a total station at the Freiberger Mulde and during the first campaign at the Wesenitz. During the second 10 campaign at the Wesenitz GCPs were extracted from cobblestone corners (with sufficient contrast) at the gauge, which are visible in the terrestrial images used for the 3D model reconstruction. GCPs were measured in at least five images for sufficient redundancy and thus more reliable coordinate calculation.
The bathymetric information of the river reaches was retrieved using the same UAV data as for the topographic information above the water level. Refraction impacts are accounted for using the tool provided by Dietrich (2017). Therefore, underwater 15 points, camera poses and interior camera parameters as well as the water level need to be provided. The corrected point clouds can be noisy and were therefore filtered and smoothed in CloudCompare using a statistical outlier filter to detect isolated points and using a moving least square filter.

Flow Velocity Workflow
The next chapter introduces the general approach to measure river flow velocities from either terrestrial or airborne video 20 sequences. Thereby, essential processing steps are described in more detail. The method is realized in Python and using the OpenCV library (Bradski, 2000). Fig. 2 illustrates the entire data processing workflow of the tool.

Frame preparation
Video sequences are converted into individual frames prior to the data processing. Afterwards, image co-registration is necessary if the camera is not stable during video capturing, as it is the case for the UAV data. Each frame of the entire video 25 sequence is co-registered to the first frame of the same sequence to correct camera movements and thus to enable that all frames capture the same scene. This processing step is preformed fully automatically. In each frame Harris corner features are detected (Harris and Stephens, 1988), which are then matched to the first frame of the sequence using SIFT descriptors (Lowe, 2004).
Harris features in the water region are detected as outliers due to their changing appearance between subsequent frames leading 30 to matching failure. And if moving features in the water area still might be matched they are latest filtered during the parameter estimation of the homography because these points will be considered as outliers during the model fitting with RANSAC (Fischler and Bolles, 1981). Thus, only stable and reliable homologous image points outside the river are kept and used to calculate the homography parameters between the first frame and all subsequent frames. Finally, a perspective transformation is applied to ensure that all frames fit to the first image. It has to be mentioned that this approach is only working as long as enough stable areas are visible on both river shores.

Finding features to track
A search area in the river region has to be defined to detect particles before tracking. This is due to the circumstance that 5 applying particle detection within the entire image will lead to the detection of points of interest in regions with more contrast outside the water area as most feature detectors are looking for regions with high contrast. Thus, the water area needs to be masked in a first processing step.
Feature search area and pose estimation: The feature search area is a region of interest that is defined as a function of the water level to mask the image. The water level and a 3D model of the river reach ( fig. 3a) observed by the camera have to 10 be known to define this water area. The 3D model is clipped with the water level value to keep solely the points below the water surface. Afterwards, these points are projected into image space ( fig. 3b). Therefore, information about the pose and the interior geometry of the camera is necessary.
The point cloud is projected into a 2D image ( fig. 3b). To fill gaps, potentially arising for 3D models with low resolution, a morphological closing is performed. Finally, the contour of the underwater area is extracted to define the search mask for the 15 individual frames. If several contours are detected, the largest contour is chosen.
Feature detection and filtering: Particles are detected with the Shi-Tomasi feature (or good feature to track; GFTT) detector (Shi and Tomasi, 1994). Thereby, features are detected similar to the Harris corner detector but a different score is considered to decide for a valid feature ( fig. 3c). Many more feature detectors are possible. Tauro et al. (2018) test several methods and show that the GFTT detector performs well and also finds features in regions of poor contrast. 20 The elimination of particles, which are not suitable for tracking, is necessary. For instance, reflections of sunlight at waves showing high contrasts on the water surface need to be removed to avoid erroneous tracking of fake particles (Lewis and Rhoads, 2015). Therefore, a nearest neighbour search is performed to find areas with strong clusters of particles. If there are too many features within a defined search radius, the particle will be excluded from further analysis. In addition, features are removed that reveal brightness values below a threshold, e.g. to avoid the inclusion of wave shadows as features. 25

Feature tracking
When features have been detected, they are tracked through subsequent frames ( fig. 4). This tracking is performed using normalised cross correlation (NCC). Normalization allows accounting for brightness and illumination differences between different frames. The positions of the detected features are chosen to define templates with a specific kernel size (10 pixels in this study). In the next frame NCC is performed within a defined search area (15 pixels in this study) to find the positions with 30 highest correlation scores for each feature, potentially corresponding to the new positions on the water surface of the migrated particles.
To refine the matching, an additional subpixel accurate processing is performed. Thereby, template and matched search area of the same size are converted into the frequency domain to measure the phase shift between both. The final matched locations define the new templates for tracking in the next frame. This tracking approach is performed for a specified number of frames.
In this study, features are tracked for 20 frames and new features are detected every 15th frame. Figure 4 shows that false tracking results can still occur, e.g. tracks that significantly deviate from the main flow direction.

5
This is amongst others due to remaining speckle detected as features or due to tracking of features with low contrast leading to ambiguous matching scores. Therefore, resulting velocity tracks need to be filtered. Tauro et al. (2018) remove false trajectories considering minimum track length and track orientation. In this study, we make four assumptions about the flow characteristic of the river ( fig. 5). Thereby, each track is the combination of the individual tracks from frame to frame, with feature detection performed in the first frame.

10
The first filter criterion is the distance across which features were tracked, comprising thresholds for minimum and maximum distances. The distance thresholds can be roughly approximated when image scale and the range of expected river flow velocity is known. In this study, the minimum and maximum distance parameters are set to zero and ten pixels, respectively. The second criterion considers the minimum number of frames across which the features have to be traceable (here 13 frames). The third and fourth criterions consider parameters of the flow direction. On the one hand, the steadiness of the flow behaviour is 15 analysed. Therefore, directions of sub-tracks (from frame to frame) are analysed for each track. Tracks are excluded when the standard deviation or the range of these sub-track directions are above defined thresholds (here 30°and 120°, respectively).
On the other hand, the main flow direction of the river is examined by calculating the average direction of all tracks. If the direction of one track is larger or below a buffer threshold (here 30°), it is rejected for further processing. 20 In the last processing stage, measured distances are transformed from pixel values to metric units to receive flow velocities in the unit of m/s. With known camera pose and interior camera geometry image measurements can be projected into object space. This leads to a 3D representation of the light ray emerging from the image plane and proceeding through the camera's projection centre. 3D object coordinates of an image measurement can be calculated by intersecting its ray with a 3D model of the river. In this case the water surface, assumed as planar at the water level, defines the location of intersection. The starting 25 and ending points of each track are intersected with the water plane to retrieve real world coordinates. From the distance between start and end and considering the camera's frame rate as well as the number of tracked frames, metric flow velocities are retrieved. Finally, the metric velocity tracks are filtered once more with a statistical filter to remove remaining outliers ( fig.   6). The threshold is defined as the sum of the average velocity with a multiple of its standard deviation. This processing step is more important for challenging tracking situations. For a better visualisation, final flow velocity tracks are rasterized.

30
Regarding tracking reliability, it should be noted that in the case of terrestrial cameras with an oblique view onto the river velocity measurements are preferred closer to the sensor. Particles move across a larger number of pixels in close range to the camera than in further distances, e.g. an erroneous measurement of 1 pixel close to the camera might result to measurement error of 1 cm whereas in further distance it can correspond to 1 m. Furthermore, tracking accuracy decreases significantly in far ranges due to increasing glancing ray intersections with the water surface.

Discharge estimation
The bathymetric information as well as the flow velocities are needed to calculate the discharge. Thereby, sole UAV data can be used as shown by Detert et al. (2017). In this study, we cut river cross-sections from the reconstructed bathymetry and 5 topography at the approximate locations of the ADCP measurements. Afterwards, we extract the water level information by manually detecting the water line in at least three overlapping images and spatially intersecting these point measurements in the object space.
The surface flow velocity values are multiplied with the velocity coefficients estimated from the ADCP measurements (Table   1). This approach is suitable at the Wesenitz. But at the Freiberger Mulder the method is restricted due to the irregular river 10 cross-sections limiting the application of a constant velocity coefficient. Finally, discharge is estimated by multiplying the cross-section area with the depth averaged velocity.

Results and Discussion
In this study, results of the accuracies of the image processing are displayed, tracked flow velocities are evaluated, and discharge estimations are analysed.

Accuracy assessment
To enable an accurate measurement of flow velocities it is necessary to consider how well the camera pose has been estimated.
Furthermore, for cameras in motion the accuracy of frame co-registration has to be evaluated as well, to ensure that tracked movements of the particles indeed correspond to river flow instead of camera movements. The accuracy of camera pose estimation can be estimated because more than three GCPs are available. At both river reaches accuracies are better for the terrestrial 20 cameras (table 2), which is due to a higher GSD as cameras are significantly closer to the area of interest compared to the UAV cameras. At the Wesenitz, another reason for the larger deviations is the circumstance that well marked, artificial GCPs were used for the terrestrial images, whereas GCPs were extracted from the 3D models to estimate the UAV camera pose leading to lower point coordinate accuracies.
Small template regions (10 pixels in size) in stable areas have been chosen ( fig. 7) to estimate the accuracy of frame co-25 registration. At the Freiberger Mulde only GCPs could be used as templates because the remaining area of interest is covered by vegetation that changes frequently. At the Wesenitz cobble stone corners close to the river surface are chosen because it is important to see how well co-registration performs close to the water body for which velocities are estimated. Each extracted reference location is tracked through the frame sequence via NCC. In case of a perfect alignment the templates should remain at the same image location throughout the sequence. In this study, at the Freiberger Mulde average deviation between tracked 30 frames to the first frame amounts 0.5 ± 0.6 pixels for all templates, which corresponds to a co-registration accuracy of 4.3 ± 5.2 Interestingly, an impact of the missing camera calibration of the UAV images is not obvious. Lens distortion parameters were only modelled for the terrestrial cameras but were discarded for the UAV camera. The impact is assumed to be minimal because the camera distortion is usually especially large for cameras with very wide angles, which is not the case for the UAV camera.
Furthermore, the distortion impact is more important when features are tracked for large distances in the image, which is also not the case in this study because features are mostly tracked between subsequent frames for only a few pixels.

15
The terrestrial camera depicts a lower spatial density of velocities compared to the terrestrial cameras at the Wesenitz (table   2), although the video sequence has comparable length. This is due to the significantly lower image resolution as well as the larger distance to the object. Therefore, less features are detectable. The average flow velocity at cross-section 1 as well as the average of contrasted individual velocity tracks are smaller than the reference. However, error behaviour of the image-based data might be less favourable at cross-section 1, where the comparison is made for image tracks measured at the far reach of 20 the image. The sharp glancing angles at the water surface lead to higher uncertainties of the corresponding 3D coordinate.
The decision about how to set the parameters for tracking (e.g. patch size) and filtering (e.g. statistical threshold) remains challenging, especially in long-term applications when spatio-temporal flow conditions can change strongly (Hauet et al., 2008). Thus, in future studies intelligent decision approaches for corresponding parameters need to be developed, for instance where measurements are performed iteratively with changing parameters. 25

Camera based discharge retrieval
Discharge estimations at the Wesenitz do not show large deviations between the cameras because velocity estimates showed low deviations, as well (table 4). Solely camera 1200D-II displays a lower discharge. Average discharge for all cameras amounts to 2.7 m³/s, which corresponds to the velocity measured by the ADCP. Deviations to the reference are below 4%, highlighting the great potential of UAV application to retrieve discharge estimates solely from image data in regular river cross-sections.

30
Standard deviations of the discharge estimations due to the consideration of the standard deviation of the surface flow velocities is small, ranging from 0.18 m³/s (7%) to 0.56 m³/s (8%) at the Wesenitz and Freiberger Mulde, respectively (table 4).
At the Freiberger Mulde, discharge estimates do not fit as well to the reference measurements. Velocities are only observed in the main flow of the river, where flow velocities are higher. Deviations to the ADCP reference are larger for the terrestrial camera, whose measurements are only compared to profile 1, which shows the largest range of flow velocity and depicts very low values outside the main flow ( fig. 1). Comparing single velocity values to nearby ADCP measurements, instead of averaged cross-section information, reveals that the accuracies of image-based velocity measurements are indeed higher (table   3). Neglecting the slower flow velocities in the shallower river region outside the main flow leads to overestimated discharge 5 values for the irregular shaped cross-sections, which is in contrast to the regular cross-section at the Wesenitz. In addition, using the average velocity coefficient is adverse because the irregular profile shape indicates a changing coefficient (Kim et al., 2008).
The higher error of the flow velocity estimation from the terrestrial camera images due to the unfavourable perspective in the far reach of the camera also leads to highest deviations of discharge compared to the UAV imagery. Another important issue that needs to be noted is the circumstance that the image-based discharge estimation reveals a high variability that is sensitive 10 to the defined wetted cross-section extracted with the defined water level. For instance, at the Wesenitz already 1 cm offset in the water level value causes a discharge difference of 0.08 m³/s (3%) and 3 cm cause a difference of 7% (0.2 m³/s). Different studies already highlight that the correct water level is important for an accurate discharge estimation due to the wetted crosssection area error but that it is less relevant for the accuracy of the flow velocities due to erroneous ortho-rectification (Dramais et al., 2011, Le Boursicaud et al., 2016, Leitao et al., 2018.

Conclusions
In this study, we introduce a remote sensing workflow for automatic flow velocity and discharge calculation. The approach can be applied to terrestrial as well as aerial imagery. Thus, the importance of the acquisition scheme is secondary. However, visibility of tracked particles across the entire river cross-section is relevant as indicated by comparison of three different terrestrial cameras observing nearly the same river reach but revealing variations in the velocity estimates. Camera movements during the 20 video acquisition are stabilized using an automatic image co-registration method. To estimate flow velocity, particles on the water surface are detected and tracked using PTV. A feature search area is defined automatically solely relying on information about the water level and the topography of the river reach. The detected and tracked particles are filtered with cluster analysis and by making assumptions about the flow characteristics. Discharge is retrieved using the depth-averaged flow velocity and the wetted cross-section, which is derived from a 3D model reconstructed with multi-media photogrammetry applied to UAV 25 imagery.
Two study sites have been observed with different terrestrial cameras and with a UAV platform. Comparing the results with ADCP reference measurements reveal a high accuracy potential for surface flow velocities calculated with PTV and automatic image co-registration, especially at standard gauging setups (maximal error of 5%). At irregular cross-sections accuracy assessment of velocity tracking is limited due to high demands of position accuracies of the reference measurements. Discharge 30 estimates with errors below 4% could be achieved at the standard track cross-section. At irregular profiles discharge calculation reveals significantly higher differences to reference measurements of 7 -31%. This is, amongst other reasons, due to incomplete velocity measurements across the entire river cross-section, leading to discharge overestimation when tracks are