A comparison of tools and techniques for stabilising UAS imagery for surface flow observations

While the availability and affordability of unmanned aerial systems (UASs) has led to the rapid development of remote sensing 15 applications in hydrology and hydrometry, uncertainties related to such measurements are still to be quantified and mitigated. Physical instability of the UAS platform inevitably induces motion in the acquired videos and can have a significant impact on the accuracy of camera-based measurements such as velocimetry. A common practice in the data preprocessing stages is the compensation of platform-induced motion by means of digital image stabilisation (DIS) methods, which use the visual information from the captured videos – in the form of physically static features – to first estimate and then to compensate such 20 motion. Most existing stabilisation approaches rely either on in-house built tools based on different algorithms, or on generalpurpose commercial software. Intercomparison of different stabilisation tools for UAS remote sensing purposes that could serve as a basis for a selection of a particular tool in given conditions has not been found in the available literature. In this paper we have attempted to summarise and describe several freely available DIS tools applicable to UAS velocimetry purposes. A total of seven tools – six aimed specifically at velocimetry and one general purpose software – were investigated in terms 25 of their (1) stabilisation accuracy in various conditions, (2) robustness, (3) computational complexity, and (4) user experience, using three case study videos with different flight and ground conditions. In attempt to adequately quantify the accuracy of the stabilisation using different tools, we have also presented a comparison metric based on root-mean-squared differences (RMSD) of interframe pixel intensities for selected static features. The most apparent differences between the investigated tools have been found with regards to the method for identifying and selecting static features in videos – manual selection of 30 features or automatic. State-of-the-art methods which rely on automatic selection of features require fewer user-provided parameters and are able to select a significantly higher number of potentially static features (by several orders of magnitude) when compared to the methods which require manual identification of such features. This allows the former to achieve a higher https://doi.org/10.5194/hess-2021-112 Preprint. Discussion started: 12 April 2021 c © Author(s) 2021. CC BY 4.0 License.

purpose commercial software. Intercomparison of different stabilisation tools for UAS remote sensing purposes that could serve as a basis for a selection of a particular tool in given conditions has not been found in the available literature. In this paper we have attempted to summarise and describe several freely available DIS tools applicable to UAS velocimetry purposes.
A total of seven toolssix aimed specifically at velocimetry and one general purpose softwarewere investigated in terms 25 of their (1) stabilisation accuracy in various conditions, (2) robustness, (3) computational complexity, and (4) user experience, using three case study videos with different flight and ground conditions. In attempt to adequately quantify the accuracy of the stabilisation using different tools, we have also presented a comparison metric based on root-mean-squared differences (RMSD) of interframe pixel intensities for selected static features. The most apparent differences between the investigated tools have been found with regards to the method for identifying and selecting static features in videosmanual selection of 30 features or automatic. State-of-the-art methods which rely on automatic selection of features require fewer user-provided parameters and are able to select a significantly higher number of potentially static features (by several orders of magnitude) when compared to the methods which require manual identification of such features. This allows the former to achieve a higher https://doi.org/10.5194/hess-2021-112 Preprint. Discussion started: 12 April 2021 c Author(s) 2021. CC BY 4.0 License. stabilisation accuracy, but manual feature selection methods have demonstrated lower computational complexity and better robustness in complex field conditions. While this paper does not intend to identify the optimal stabilisation tool for UAS-35 based velocimetry purposes, it does aim to shed a light on implementational details which can help engineers and researchers choose the tool suitable for their needs and specific field conditions. Additionally, the RMSD comparison metric presented in this paper can also be used in order to measure the velocity estimation uncertainty induced by UAS motion.

Introduction
The application of unmanned aerial systems (UASs; often Unmanned Aerial Vehicles, UAVs) for large-scale image 40 velocimetry is in rapid expansion due to several key factors: (1) the reduction of UAS production costs, (2) technological advances in digital photography and videography, and (3) development and improvement of various velocimetry methods (Manfreda et al., 2018(Manfreda et al., , 2019Pearce et al., 2020). To perform an adequate velocimetry analysis using UAS video data, the relationship between the real-world coordinates and the points in the video's region of interest (ROI) should be constant throughout the entire video frame sequence. These conditions are not practically attainable using UAS with or without camera 45 gimbal given the current state of UAS technology. This is because even small camera movement during high altitude flights, caused by the vibrations of the UAS, wind-induced turbulence, issues with GPS positioning, and operator inexperience, can result in large apparent displacements of features in the ROI. Similar issues can arise from videos obtained using handheld devices, and even from terrestrial cameras (Le . To reduce motion induced errors, it is necessary to perform stabilisation of the UAS-acquired video onto a fixed frame of reference prior to the velocimetry analysis. 50 Image stabilisation can be achieved with two approaches: (1) mechanical stabilisation of the UAS platform and/or camera, or (2) using digital image stabilisation -DIS (Engelsberg and Schmidt, 1999;Wang et al., 2011). Since the capabilities of the former method are limited only to low-intensity vibrations and movement, DIS is commonly used as a part of the video preprocessing stage (Detert and Weitbrecht, 2015;Fujita and Notoya, 2015). Pioneering works of Morimoto and Chellappa (1996a, 1996b, 1998 proposed a stabilisation procedure based on estimating the interframe movement of a small number of 55 image features. Using the information on the movement of local features, global motion of the image can be estimated, assuming that the local interframe motion is sufficiently low. While the concept of feature-tracking in videos is not novel, the algorithms for feature selection and tracking have evolved significantly over time. Generally, DIS methods perform either 2D-to-2D transformation (plane-to-plane, homography), or a complete reconstruction of camera motion in 3D space. The latter methods usually rely on structure-from-motion (SfM) techniques to estimate the 60 camera path in 3D space. In the case of image velocimetry for open channel flow, detailed information on the surface terrain would have to be generated in order to reconstruct the 3D camera path. Such an approach is computationally complex (Liu et al., 2009(Liu et al., , 2011(Liu et al., , 2012, and is not used as widely as the 2D methods. In large-scale UAS velocimetry, camera motion is mostly limited to the horizontal plane (translation and yaw rotation), while the amount of other types of motionsuch as the pitch and roll rotation or scaling of the imageis generally low for level-flight conditions in favourable weather conditions. For To this date and to the authors' knowledge, there are no standardised DIS procedures for velocimetry purposes, and many researchers rely on in-house developed solutions or general-purpose off-the-shelf video editing software. Moreover, no 100 intercomparison of different DIS tools appears to be available for UAS velocimetry. Hence, the aim of this research is threefold: (1) development of novel metrics for quantifying magnitude and direction of camera motion, (2) presentation and comparison of seven approaches for compensating the apparent motion in the video using three case studies with different forms and intensities of camera motion, and (3) assessment of user experience with the selected tools, computational demands and limitations. It is important to note that the aim of the current research is not the development of an optimal stabilisation 105 algorithm, but rather a comparison of the performance of a number of freely and publicly available stabilisation tools. As each tool is implemented by a different author using various algorithms and metrics, and given the rapid development of such tools, this research is also not focused extensively on the implementational details but rather aims to provide a general comparison of accuracy and limitations for each tool, as well as some general guidelines for best use in different scenarios.

Materials and methods 110
Stabilisation tools examined in this research aim to analyse a sequence of frames from UAS videos for the displacement of a finite number of static features in the ROI, and to annul such movement by transforming the ROI from every frame onto a reference (constant) coordinate system. The reference system is commonly defined by the initial frame of the image sequence but can also be defined manuallyusually when spatial positions of certain points in the ROI are known. The most significant differences between the available 2D stabilisation methods are evident in their approaches to selecting and tracking the 115 movement of static features, and as such they can be separated into two groups: (1) approaches with manual selection of static features and automated estimation of their displacement in subsequent frames (feature tracking) using various metrics; (2) using an automatic selection of features from the entire image based on method-specific criteria, while the displacement estimation is performed using binary feature matching techniques. Since the feature tracking in the first group is usually based on comparing image subareas from subsequent frames, these methods can also be described as area-based. Approaches that 120 automatically select well-defined features often describe such features using descriptors, i.e. vectors of specifically derived values which aim to uniquely describe the shape and orientation of the feature. In such cases, these algorithms can be described as feature-based. In either approach, static features can be either artificial ground control points (GCPs), or other motionless, visually well-defined features.
In this research, implementations of the following manual stabilisation algorithms were investigated (with corresponding 125 abbreviations used hereinafter): 3. SSIMS: feature tracking using Structural Similarity (SSIM) index (Wang et al., 2004) implemented by the SSIMS: SSIM Stabilisation Tool, 4. KLT-IV: Kanade-Lucas-Tomasi feature tracking implemented by the tool KLT-IV 1.0 (Perks, 2020), 5. Blender/M: off-the-shelf video editing suite in 3D computer graphics software Blender (also capable of automatic feature selection, denoted by Blender/A), 135 It is important to note that, even though features in the KLT-IV approach are automatically selected by the "Good features to track" algorithm (Shi and Tomasi, 1994), it requires manual delineation of small subareas in which the features can be found. Therefore, KLT-IV was placed in the group with manual feature selection approaches.
Along with the stabilisation tools that employ a manual selection of static features, two implementations of stabilisation algorithms with automatic feature selection were investigated: 140 6. FAST: using the FAST algorithm (Rosten and Drummond, 2006) implemented in Matlab, 7. AKAZE: using the AKAZE algorithm (Alcantarilla et al., 2013) implemented by FlowVeloTool (Eltner et al., 2020), The general outline of the image stabilisation algorithms used in this research can be summarised in three steps: 1. Video is split into individual frames for further analysis; however, KLT-IV and Blender can sequentially select frames from the video thus eliminating the need for additional storage space for individual frames, 145 2. Well-defined static features are manually or automatically selected from a reference frame, and their position is tracked in all subsequent frames, 3. Using the positions of the matched static features in the reference and current frame, relative camera motion can be estimated. Image transformation algorithms can be applied to spatially align the frames with respect to a reference coordinate system. 150 Stages 1 and 3 described above are invariant for all image stabilisation algorithms. The stabilisation performance is generally determined by the accuracy of the feature tracking stage (stage 2), in smaller part by choice of the image transformation method in stage 3 and the quality and distinctiveness of the detected features in stage 1. In this research, only 2D transformation methods are consideredsimilarity, affine, projectiveas for UAS videos these methods are almost exclusively used (Baek et al., 2019;Detert et al., 2017;Perks et al., 2016;Tauro et al., 2016). 155 With regards to the image transformation stage, two approaches to selecting the reference coordinate system are generally possible, and were investigated in this research: 1. Fixed coordinate system: reference system is defined by a single frame, usually the initial frame of the video. This option is more accurate of the two as no information is lost as the feature detection/tracking propagates through the frame sequencethe algorithm always tries to match the features to the original features from the initial frame. 160 However, this approach is reliable only when no significant rotation or scaling of the ROI is present.
2. Updated coordinate system: reference system is updated after each frame with the positions of newly detected features. This is a more robust approach in cases of substantial rotation and/or scaling of the ROI at the cost of lower stabilisation accuracy than with the "fixed coordinate system" approach. In the following sections, a general workflow of tools using manual and automatic feature selection was presented, along with 165 short discussions on functionalities of each algorithm/tool.

Manual feature selection approach
Considering that camera motion relative to the ROI can usually be estimated by tracking a relatively small number of static features, a number of available tools employ a manual selection of static features (repositories listed in Table A3 at the end of 170 the paper). Static features are selected by delineating suitable image areas (interrogation areas, IAs) in which they are contained, after which each of the investigated tools aims to search through the neighbouring areas (search areas, SAs) in the subsequent image in order to estimate their interframe displacement. Key differences between the tools were found with regards to the metric used for displacement estimation. Once the new feature positions are estimated, the positions of the search areas are usually updated for the following image (FFT-CUAS, FFT-DCH, SSIMS and Blender/M), although some tools such 175 as the KLT-IV employ pyramid KLT approach to enable the search for features to be performed across sufficiently large image subareas. With either strategy, the estimation of selected feature positions propagates through the image sequence one frame at a time. Some specific details of each examined tool are presented in the following sections, while more detailed descriptions are available in the repositories of individual tools (Table A3). A general outline of the manual feature selection approach is summarised in Fig. 1.

FFT-based tools: FFT-CUAS and FFT-DCH
FFT-based tools employ Fast Fourier Transform (FFT) cross-correlation in interrogation areas (IAs) around static features that experience apparent movement in order to estimate feature displacement from frame to frame. 185 FFT-CUAS (https://bitbucket.org/SIENA_Research/fishstream) is based on the capabilities of freely available PIVlab MATLAB extension (Thielicke and Stamhuis, 2014), which is widely applied in the scientific community for particle image velocimetry (PIV) analysis (Le Dal Sasso et al., 2020;Detert et al., 2017;Detert and Weitbrecht, 2015;Lewis et al., 2018). The algorithm can be run in several iterations in order to increase the signal-to-noise ratio of cross-correlation.
The size of the search area must be selected based on the expected frame-to-frame apparent motion of static features. Since 190 the frame-to-reference displacement of static features may increase with the number of frames processed, the locations of the search areas are updated with each frame by centring them at locations of tracked static features. This allows to account for the apparent motion of larger magnitude, when the frame-to-reference displacement of static features is more than half of the selected search area size. Image pre-processing/filtering available in PIVlab can be applied to the original images to increase the accuracy of FFT peak detection by enhancing image contrast and decreasing image noise. In order to increase the accuracy 195 of displacement calculation, a 2x3-point Gaussian subpixel estimator is used. The type of image transformation (affine or projective) can be selected by the user.
Similarly to the previous tool, FFT-DCH (https://github.com/salpeha/FFTVidStabilization) is based on OpenPIV (Liberzon et al., 2020) and uses a cross-correlation technique based on Fast Fourier Transform to estimate the frame-to-frame apparent motion. However, the implementation offers fewer options for feature tracking and transformation than FFT-CUAS. The 200 frame-to-frame motion is determined by comparing four subwindows of a frame to the corresponding windows of the previous frame. The user defines four points located at static positions in the image (e.g., no flowing water, no wind-moved vegetation), preferably at the corners, and the size of the subwindow (search area). The search area is then defined around these four points on the image. The size of the search area depends on the image resolution and the expected frame-to-frame apparent motion.
At the time of the analysis, the available version of the tool did not allow the search areas to be updated with the positions of 205 tracked features from the subsequent frames.
Even though both tools presented in this section are based on similar metrics, their implementation is significantly different, and their comparison will aim to expose the level of importance that the tool's implementation has in the overall accuracy and robustness of the stabilisation.

SSIMS: SSIM-based tracking 210
This tool (https://github.com/ljubicicrobert/SSIMS) is based on an image comparison metric developed by Wang et al. (2004) and is implemented in the SSIM Stabilisation Tool. Structural Similarity (SSIM) index can be used to compare two images of the same size and to assess their overall similarity. Unlike some image comparison metrics, such as the mean-squared-error (MSE), SSIM is significantly more robust in terms of global changes of brightness and contrast, as it mostly relies on the https://doi.org/10.5194/hess-2021-112 Preprint. Discussion started: 12 April 2021 c Author(s) 2021. CC BY 4.0 License. structural information. A specific operator is convolved in the corresponding search areas from consecutive frames, which 215 compare sub-regions from the current frame and the reference frame. This workflow generates a score map with values from -1 to 1, where 1 indicates a perfect match. The position of the maximal score indicates the likely position of the tracked feature in the current frame. To further improve the feature position detection accuracy, an arbitrary-sized Gaussian subpixel peak estimator is implemented. The positions of SAs are updated using positions of the tracked features. Both "fixed" and "updated" reference coordinate system strategies are implemented, but the use of the latter is generally only required for videos with a 220 significant rotation of the ROI (usually > 15°) or significant scaling caused by altitude changes and/or camera zooming.
The tool also offers an option to estimate the per-feature performance of the feature tracking stage by employing a specific root-mean-square-difference (RMSD) analysis (post-tracking), to help the user to manually choose which of the tracked features are to be used in the image transformation stage. The image transformation can be performed using any of the possible homographic methodssimilarity, affine, projectivewhile also allowing least-square-or RANSAC-based (Random Sample 225 Consensus (Fischler and Bolles, 1981)), filtering of unacceptable feature correspondences.

KLT-IV: Kanade-Lucas-Tomasi tracking
This stabilisation approach is an inbuilt function within a MATLAB-based image velocimetry application tool KLT-IV (Perks, 2020) (https://sourceforge.net/projects/klt-iv). This application has been developed for the generation of surface velocity estimates, and river flow observations from cameras on both fixed, and moving platforms (e.g., UAS). The reference coordinate 230 system is defined by the first frame of the sequence, and subsequent images are aligned to it.
Firstly, the strongest 10% of detected corner points are automatically selected from each of the four quadrants of the image based on a minimum eigenvalue algorithm (Shi & Tomasi, 1994). This maximises the point distribution across the image and ensures that the strongest features are used in the stabilisation process. However, this approach is not fully automated as the user is required to define an ROI. The ROI polygon defines the areas within the image where motion occurs (i.e., where image 235 velocimetry analysis should be focussed), and it is assumed that the area beyond this ROI is where the static features are located. Only the corner points retained in the first step that are located beyond the ROI are used in the stabilisation process.
For each subsequent frame within the video sequence, corner features are detected and are matched to the points within the reference frame using the Kanade-Lucas-Tomasi (KLT) feature-tracking algorithm composed of five pyramid levels. After the frame sequence has been stabilised, there is an option of running a second pass to stabilise the image sequence further. The 240 difference between the first and second pass is that the search area (block size) is reduced in the second iteration. The first pass can therefore be seen as a coarse registration, with the second being a fine registration. The second pass is only required with videos exhibiting significant movement (e.g., Basento case study described in Sect. 2.4), with most deployments (e.g., Kolubara and Alpine case studies in Sect. 2.4) requiring a single pass for acceptable image stabilisation results.

Blender video editor 245
Blender (https://www.blender.org) is a complete 3D modelling and animation suite, which also contains a video editing suite with stabilisation capabilities. While not aimed specifically at either velocimetry or video stabilisation, Blender is a popular, free, and open-source off-the-shelf software which offers both manual and automatic selection of well-defined features. While it is clear from the user manual that the feature tracking relies on IA/SA approach (similarly to all previous tools), the metrics for the estimation of feature displacements are not clearly presented. However, we have included this tool in the comparison 250 to investigate whether the use of dedicated tools is necessary for UAS video stabilisation purposes, or general-purpose software is accurate, fast, and simple enough to be used for this application.

Automatic feature selection approach
Advances in computer vision techniques have enabled automatic detection of well-defined image features, which can be used 255 to estimate the relationship between two images. Automatic feature selection algorithms aim to detect and describe specific, distinct features in an image, such as local corners or blobs, which display high pixel intensity gradients in at least two directions. For each feature detected, a descriptor is calculated that summarises the structure of the feature. For the purpose of image stabilisation, such detection and description can be performed for two consecutive frames from the video. Once such features have been automatically detected, feature matching is performedtheir descriptors are compared, for instance, via 260 calculating the Euclidean distance between n-dimensional descriptor vectors (Lowe, 2004). The main parameters of feature detection and matching methods are detection threshold (which determines the sensitivity of the feature detection, and therefore the number of detected features), feature matching algorithm, matching threshold, and matching ratio. Once the feature pairs have been detected, a transformation matrix between the two images can be determined. Since automatic detection algorithms usually detect a relatively high number of feature pairs when compared to the approaches with manual selection of features, 265 the possibility of outliers is increased and therefore their detection becomes necessary (e.g., RANSAC filter).
The general outline of automatic feature selection and tracking algorithms is presented in Fig. 2. Such algorithms generally do not require artificial GCPs in order to perform adequately, which can be a significant advantage for fieldwork on inaccessible terrain. However, the general absence of a priori knowledge on the stability and quality of detected features in the reference image requires the algorithm to collect a high number of candidate static features -often 100's, or 1000's of features in order 270 to obtain adequate results. Hence, the automatic feature selection strategy offers a benefit of lower operator involvement at a higher computational cost.

Figure 2. General outline of the automatic feature selection/tracking approach
In this research, we investigated two stabilisation tools based on automatic feature detection algorithms: (1) Features from 275 accelerated segment test -FAST, presented by Rosten and Drummond (2006) and implemented by VISION (Pizarro et al. 2021), and (2) Accelerated KAZE -AKAZE, proposed by Alcantarilla et al. (2013) and implemented by FlowVeloTool (Eltner et al., 2020).
Compared to other popular automatic feature detection algorithms, FAST is generally more computationally efficient. FAST detection and matching algorithms used in the tested stabilisation tool are implemented in a command-line function written in 280 MATLAB. FAST is able to identify edges as feature points in grayscale images with low scale changes, whereas a descriptor is then computed around the detected features using the Fast Retina Keypoint (FREAK) algorithm. FREAK is a binary descriptor which accounts for changes in scale and rotation used to find feature points correspondences among images and therefore, stabilising. A RANSAC algorithm was also applied with the intention to remove false matches, catalogued as outliers. Two variants of the approach were tested: (1) allowing the feature detection across the entire image, or (2) manually 285 selecting an ROI where the static features are most likely to be located. The latter approach can be thought of as a manual filtering stage which can provide significant accuracy and efficiency improvement. the second closest match reveals a significantly larger distance to the first match. In the stabilisation module of the FlowVeloTool, the AKAZE feature detector and descriptor and brute force matcher is used to find corresponding keypoints. 295

Image transformation
For the velocimetry purposes, all the analysed frames should exist in the same (reference) coordinate system so that real-world velocity estimation can be performed from in-image pixel displacements. When dealing with data that experiences apparent motion, this is achieved by applying linear geometric transformation (homographic) techniques to raw imagessimilarity, 300 affine or projective transformation.
Linear image transformation is defined by three geometric operations: (1) translation, (2) rotation, and (3) scaling, and can be summarised by a matrix multiplication of the original pixel-space: where x and y are the original point coordinates, x' and y' are the transformed point coordinates, and T is the transformation 305 matrix. In the matrix T, a1...a4 represent the rotation and scaling parameters, b1 and b2 are translation parameters. Image shear is defined within the parameters a1 and a4. Parameters c1 and c2 define the projection vector. In the case of the projective transformation, such parameters define the 3D rotation of the image plane around the horizontal and vertical image axes. All three approaches preserve collinearity and incidence. Similarity and affine methods also preserve parallelism, while the projective (in general) does not. In the affine transformation, c1 and c2 are zeros, while the similarity transformation is a special 310 case of the affine method with shearless rotation. Due to this, both similarity and affine transformation methods are merely special cases of the projective method, both of which could be unable to account for image deformations in cases with significant pitch and roll rotations. However, as such cases are exceedingly rare in UAS velocimetry, where camera is optimally in nadir orientation, all three transformation methods can potentially be used with comparable results.
In order to define a transformation matrix T, relationships between two point pairs are needed for the similarity transformation, 315 three pairs for the affine, and four pairs for the projective transformation. Examples of presented methods are presented in Fig. 3. Since all three described transformation methods are linear, as they transform an image from a plane in 3D space onto another plane, their accuracy will be impacted by the camera distortion parameters. If significant barrel-or pincushion-type distortion is present in the original image, they should be removed prior to the image transformation.

Case studies
For the purpose of performance comparison of the presented tools, three case studies with different ground and flight conditions were analysed. The case studies were specifically chosen to exemplify a gradual increase of UAV motion intensity and 325 complexity, so that the limitations of specific tools can be adequately assessed: 1. UAS video with uniform GCP patterns and low camera movement (translation, rotation), 2. UAS video with various GCP patterns and moderate camera movement (translation, rotation), 3. UAS video without GCPs and with significant camera movement (translation, rotation, perspective deformation, and scaling). 330 The purpose of the first case study is to examine the performance of stabilisation algorithms in highly controlled conditionslow amounts of UAS/camera movement and vibrations, no significant rotation or altitude changes, all the GCPs are of the same pattern and positioned at the same level and at identical distances from the water surface. In total, six GCPs were positioned in the ROItwo 65x65 cm and four 20x20 cm in size (approx. 65x65 px and 20x20 px in images, respectively), as shown in Fig. 4. However, not all control points were used in the stabilisation procedure: four points (marked GCP 1-4) were 335 used for the stabilisation, and two (marked V1, V2) were intentionally omitted in order to be used as verification points in the stabilisation accuracy analysis (method described in Sect. 2.5.2). This limitation was, understandably, imposed only onto those tools which employ manual feature selection. The experiment was conducted during low flow conditions on the Kolubara river in Serbia, in November 2018.  Both previous case studies involve the use of GCPs for image stabilisation in relatively controlled conditions. The third case 350 study conducted on the Basento river in Italy (ROI presented in Fig. 6), aims to investigate whether the analysed stabilisation tools are applicable to a video with no artificial GCPs or well-defined static features, which also contains a high amount of camera movement (Dal Sasso et al., 2020;Pizarro et al., 2020). A black-and-white pole was placed in the ROI during the video recording and was later used to identify the ground sampling distance of 0.5 cm px -1 . The unfavourable video recording conditions were expected to be more challenging for the stabilisation tools and could help with the identification of the 355 limitations of specific approaches.  Grayscale images were extracted directly from the UAS videos and were not preprocessed or filtered in any way prior to the stabilisation. Frames from the Kolubara and Basento studies were extracted at original framerates, but for the Alpine case study every 5 th frame from the original video was extracted and used, in order to further increase the amount of apparent interframe motion. Relevant metadata of the videos used in this study is presented in Table 1, while the location of the dataset 365 with both unstabilised and stabilised images is listed in Table A3.

Comparison metrics
In order to evaluate the performance of stabilisation algorithms, it is necessary to measure the residual displacement of static 375 features in the stabilised frame sequences, i.e. stabilisation errors. Considering that the residual displacements are both spatially and temporally distributed in the stabilised frame sequences, both distributions should be adequately described. For this reason, we have used a combination of two approaches for performance assessment: 1. For the estimation of the spatial distribution of residual motion, we have applied a surface velocity field (SVF) analysis between the initial frame and a selected number of frames from the sequence. Such analysis was only 380 performed on those regions of the ROI, which are comprised of static features. This strategy has a benefit of illustrating the type of apparent motion (e.g., translation, rotation, tilt, scale change) observed in a frame sequence.
However, such analysis can only be effectively used for the analysis of a relatively small number of frame pairs since the character and intensity of the spatial distribution of residual motion cannot be efficiently summarised for the entire frame sequence unless motion type and intensity do not vary. 385 2. For the estimation of the temporal distribution of residual motion, we describe and propose an alternative metric for the estimation of the magnitude of the residual displacement of static features based on a pixel-intensity root mean square differences (RMSD). This metric can be effectively applied across the entire frame sequence for a small number of selected static features. In the case of this research, such static features were verification points denoted as V# in Figs. 4 and 5. An additional benefit of this metric is that it is not contained within any of the feature displacement 390 estimation techniques used by the analysed stabilisation tools -hence, no bias towards individual methods is expected.
However, this method does not provide information on the actual type of motion.
Considering the aforementioned, the two comparison metrics aim to provide complementary insights into the types, intensities, and (potentially) sources of errors for different stabilisation tools.

SVF analysis 395
Surface velocity field estimation is performed for some of the representative frame pairs in order to estimate the performance of stabilisation algorithms when dealing with different types of camera motion. When applied to stabilised frames, SVF analysis allows assessment not only of the magnitude of residual displacement, but also of its direction, thus exposing the strengths and weaknesses of the stabilisation approach when dealing with different types of camera motion. The disadvantage of this approach is its computational complexity and difficulty of generalisation of analysis results. 400 For the Kolubara case study, we analyse four frame pairs formed by combining the initial frame (#1 in the sequence) with each of the frames ## 51, 151, 351 and 551. These frames were selected as (1) they illustrate different types of motion: tilt, rotation, scale change and the combination of the latter two, and (2) the motion magnitude is sufficient for unambiguous visual identification of the motion type. For each of the frame pairs, dense surface velocity fields are calculated with the use of FFT cross-correlation implemented in PIVlab, and then aggregated to eight vectors which characterise eight sub-regions of the ROI 405 (Fig. 7). The choice of sub-regions was motivated by the following criteria: (1) they contained no moving features, (2) they were sufficiently lit and (3) after summarising the vectors to one per sub-region, the level of detail was still sufficient for determining the type of the residual apparent motion. For the Alpine case study, SVFs are calculated for both islands (see Fig. 5) located in the middle of the river and averaged to one vector per island. Since camera motion in this case study has the same direction throughout the image sequence, SVF analysis is performed for one frame pair for each stabilised image sequence. This frame pair (## 1 and 152) is selected with the goal to maximise the magnitude of an apparent frame-to-reference motion of static features, which gradually increases towards the end of the frame sequence. 415 https://doi.org/10.5194/hess-2021-112 Preprint. Discussion started: 12 April 2021 c Author(s) 2021. CC BY 4.0 License.

Point displacement through RMSD
In order to allow for automated quantification of residual motion magnitude in the entire image set, we propose a 2D root mean square difference (RMSD) metric which operates by directly comparing a number of sub-regions within subsequent images.
The aim of the proposed method is to provide a quantitative description of subpixel displacements that is easier to compute directly from images. Small rectangular sub-regions were sampled from stabilised images, and these regions were compared 420 to the same regions from the reference frame. The differences in pixel-wise intensities between the sub-region pairs are likely to indicate the similarity of the two regions and, subsequently, the quality of the image stabilisation. For two sub-regions from images A and B, with heights and widths N×N (where N is an odd number) and centres at (x0, y0), we define n = N/2 -1/2, so that the RMSD can be calculated as: where YA and YB are the single-channel (e.g., grayscale) pixel intensities from images A and B, respectively. This comparison can be performed for a number of chosen feature points in the images. The average RMSD from all features in the frame sequence represents the total score of the selected method, with a lower score indicating a higher similarity between the compared sub-regions and, therefore a higher stabilisation accuracy. The choice of comparison features, as well as the size of the examination sub-region, were presented for each relevant case study in Sect. 2.4. 430 Similarly to the feature tracking strategies, two approaches can be applied -(1) using the initial frame is the reference frame, and (2) frame-by-frame comparison. The former criterion is important when an accumulation of errors is possible in the stabilisation algorithm, and mostly describes the impacts on the pixel positioning accuracy, while the latter can be a better estimate on the overall impact on velocimetry results, as velocimetry also uses frame-by-frame comparison method.
The use of the first frame as the reference for matching subsequent frames, as opposed to an approach where the reference is 435 the previously stabilised frame in the sequence, is beneficial as the potential for drift in the stabilised output is eliminated.
However, a limiting factor is that the reference frame must share a significant portion of the field of view with each image within the sequence to enable features to be matched, and the application of an accurate transformation.
The goal of the analysis is to establish a correlation between the easily calculable RMSD and the actual displacement d in the form of d = f(RMSD), so that the temporal distribution of residual displacements can be efficiently estimated. Parameters used 440 by each of the tools to produce the stabilised images are presented in Table A1 in the Appendices.

Results
In this section we present the analyses of the stabilised image sets. In order to obtain the best performance from each of the stabilisation tools, the authors of the respective tools were provided with the unstabilised videos and given the task to provide authors were given the freedom to choose the approach they found suitable to each case.

SVF analysis 450
The SVF analysis of the unstabilised video for the Kolubara case study revealed that the magnitude of apparent motion in four selected frame pairs was between 4 and ~38 px. Frame pairs illustrating rotation and rotation/scale change were characterised by the greatest instability. Vector fields illustrating details for the eight sub-regions of the ROI in each of the unstabilised frame pairs are displayed in Table 2.   Table 3 presents residual SVFs calculated for each of four representative frames in the datasets stabilised with different tools, 460 averaged for each of eight sub-regions of the ROI. The residual apparent movement of static features has a magnitude of less than 3 px for all the analysed stabilisation tools, whereas for many of them it is below 1 px on average. According to SVF analysis, the most stable sub-region in the frames processed with FFT-CUAS is located on the left side in the middle of the ROI along the river (sub-regions 2, 4, 6 in Fig. 7). A peculiarity of image stabilisation with the use of Blender/M is the change in image colour. Visual analysis of this change indicates contrast reduction, which complicates the identification of traceable features (Fig. 8). 485 Overall, SVF analysis results of the first case study indicate that most of the analysed stabilisation tools transform raw frames in a way that the residual displacement of the static features lies in a subpixel range. For the second case study, residual displacement of the static features was estimated for the two islands in the middle of the river, since the stability of this region is of the most interest in the light of the prospective image velocimetry analysis. SVFs calculated for both islands were further averaged to one vector per island (Table 4). 490

500
Analysis of the residual SVFs for the Alpine case study indicated that FAST has not managed to compensate for the apparent motion in the original frame sequence, but rather decreased its magnitude. Similarly to the Kolubara case study, Blender/M significantly overcompensated for the apparent motion in the raw frames, and has changed the colours in the images reducing the contrast and complicating the recognition of traceable features (Fig. 8). stabilisation tools. It is however important to keep in mind that due to the automatic feature selection mechanism, AKAZE used static features on the islands for image stabilisation, which was likely to increase their stability. was the only tool whose application resulted in colour changes that may have a negative influence on the quality of prospective image velocimetry analysis.

Point displacement through RMSD
To examine the adequacy of the proposed RMSD metric for use instead of manual planar displacement measurements, we 520 devised a synthetic test. Using the initial frames from the original videos, displacements were induced by artificially shifting the images in the horizontal and vertical direction by ∆ and ∆ pixels (both positive and negative displacements were considered). To find suitable validation points, a number of candidate points were manually selected, and d(RMSD) relationship was determined. Candidate points with high R 2 of the d(RMSD) were selected as validation points, and these are marked as V# in Figs. 4 and 5. The results of the analysis are presented in Figs. 9 and 10 and demonstrate the following: 525 1. The RMSD score generally depends on the size of the interrogation window, defined by its width/height N. Size of the interrogation area around the selected feature points was gradually increased while calculating the corresponding R 2 . Optimal IA size (N = 31 px) was selected to be the one with the maximal average R 2 for all features. By using the Eq. (3) one can estimate the displacements d from RMSD scores. The accuracy of the estimation is highest for low displacements but deteriorates quickly for displacements higher than 4-5 px. Coefficients a and b are presented in Table A2 in the Appendix section for individual validation points.
3. For displacements lower than 7-8 px the relationship d and the RMSD score is monotonic. It was determined that this 535 relationship is correlated to the size of the static feature used for validation. 4. For displacements larger than 7-8 px, the RMSD plateaus or even decreases. This can be explained by the loss of the tracked static feature from the search area for large displacements. Once the search area contains only the unstructured background pattern, the RMSD approach is unable to quantify further displacements due to increased likelihood of ambiguities, and the relationship between the RMSD and d is undefined. 540 Synthetic tests indicate that the RMSD can be an adequate representation of planar displacement d in cases where the displacement between two images is within the area where the second-degree polynomial provided a high R 2 (< 4-5 px). This condition was satisfied for any two consecutive frames of both the unstabilised and stabilised frame sequences of case studies 1 and 2. If the initial frame is used as a reference, the condition was satisfied only for the stabilised frames. For the raw frame 550 sequence, the displacements d obtained through RMSD scores are likely to be underestimated due to increased chances for ambiguities for large displacements, in which case the proposed approach is unsuitable. Using the d(RMSD) metric defined by the Eqs. (2) and (3), displacements of validation points were estimated in every frame of the stabilised sequences. Estimated displacements are presented in the form of standard boxplots in Figs. 11 and 12 for the Kolubara river case study, and Figs. 13 and 14 for the Alpine river case study. Blender software was excluded from the pixel intensity-based comparison due to the 555 previously described colour grading issues, which caused positive bias towards this stabilisation tool. While the results seem to confirm the expected bias of PIVlab towards the FFT-CUAS in the SVF analysis, the d(RMSD) metric confirms that FFT-CUAS objectively presents adequate results comparable with all other well-performing tools. The two remaining tools -FFT-DCH and FASThad shown significantly higher residual displacements. Displacements after stabilisation with FFT-DCH and FAST were up to 4.3 px and 2.8 px in the Kolubara case, respectively. In the Alpine case, the 570 results obtained using FAST are above the applicability limit of the Eq. (3) with indicated errors of up to 7 px. Further inspections of the stabilised video obtained using FAST had shown that the raw motion of the camera/UAS was not properly counteracted but rather decreased in intensity. In the Alpine case study, FFT-DCH had provided results comparable with KLT-IV, SSIMS, and FFT-CUAS, with predominantly subpixel errors and median values between 0.4 and 0.8 px. This is potentially indicative of high sensitivity of the FFT-DCH tool to different ground conditions, and/or parameter changes. 575 The results of frame-to-reference d(RMSD) analysis generally substantiated the results of the SVF analysis. Interestingly, even though they employ different algorithms, almost all the tools have demonstrated the higher median errors and variability in the same regionsright riverbank in the Kolubara case and leftmost island in the Alpine case, according to both evaluation metrics. When compared to the Figs. 11 and 13, results obtained using different tools show fewer differences, with all tools now demonstrating median errors below 0.5 px in the Kolubara case and 0.65 px in the Alpine case. However, the total number of outlier values is higher than when compared to the initial frame (Figs. 11 and 13), which can in part be explained by the potential overcompensation of motion between two consecutive frames which leads to artificially induced oscillations (jitter) in the stabilised images. AKAZE still demonstrates the lowest displacements, followed by (in order of decreasing accuracy, 585 average of the two case studies) KLT-IV, SSIMS and FFT-CUAS. The apparent displacements in results of FFT-DCH and FAST are considerably lower when displacements are estimated relative to the previous frame, i.e. when the effects of error accumulation are not considered.
Analysis of the Basento river case study had proven to be more challenging than the previous two, considering the complete lack of GCPs or unique static features, as well as the variety and intensity of the camera/UAS motion with sudden strong 590 translation and rotation, complemented by changes in scale. For these reasons, the comparison was only performed qualitatively based on the criteria of jitter intensity (sudden and strong movement induced by the stabilisation tool inaccuracy, rather than actual camera motion), residual motion intensity (original camera motion that could not have been accounted for by the stabilisation tool), and the presence of undesirable deformation of the water surface area (ROI) caused mainly by the parallax effect (defined by the apparent differences between image foreground and background motion). The results are 595 presented in Table 5.   Out of seven tools tested, only FFT-DCH was not able to complete the stabilisation on the entire 38 seconds of video. Three tools were able to stabilise the video adequately and completely -FFT-CUAS, SSIMS and Blender/M. However, the colour 610 grading issue in Blender results persists and significantly deteriorates the quality of surface tracers which are meant to be used in image velocimetry. The results obtained using KLT-IV were significantly impacted by the sudden reduction of the visible part of the left riverbank, which had caused moderate but persistent jitter throughout the remainder of the sequence. AKAZE and FAST that employ a fully automatic selection of static features have demonstrated severe jitter and/or residual motion in the results, which were likely also caused by the sudden loss of riverbank features on the left-hand side of the video frames. 615 As with the Alpine case, FAST was also not able to properly counteract the original motion but rather reduce its intensity.
While seemingly compensating the general camera motion, AKAZE had produced significant jitter on the left riverbank which persisted throughout the entire video. To test a hypothesis that automatic feature selection leads to lower stabilisation quality in challenging video recording conditions, we have performed another stabilisation using Blender in an automatic feature selection mode (Blender/A). When allowed to automatically select the static features, Blender/A periodically produced jittery 620 motion, because of the occasional escaping of the tracked features from the frame, combined with the parallax effect, had caused ROI to experience deformations.
https://doi.org/10.5194/hess-2021-112 Preprint. Discussion started: 12 April 2021 c Author(s) 2021. CC BY 4.0 License. To examine the complexity of the stabilisation procedure using different tools, we have timed the procedures of feature tracking and image transformation. The average per-frame stabilisation time is presented in Table 6, along with the RAM usage range throughout the stabilisation process. The latter information was included to demonstrate the generally high requirements of the 630 off-the-shelf software when compared to purpose-specific tools. Table 6 also includes information on the stabilisation capabilities of the tools with regards to rotation and scale changes, capabilities for simultaneous orthorectification of images when real-world positions of static features are known, and the existence of accompanying graphical user interface (GUI).
Finally, we present the information regarding the programming language requirements for each tool, and a short summary of other capabilities of the specific tools.

Discussion
The results presented in the previous section aim to illustrate the importance of the image stabilisation in velocimetry analyses.
Even in favourable weather conditions, motion induced by the UAS platform is far from insignificant. Flow conditions in the 650 Kolubara case study indicated low average surface velocity of approx. 0.12 m s -1 , while the orthorectified images had a ground sampling distance (GSD) of 1 cm px -1 (Pearce et al., 2020). Alpine river case had a somewhat lower GSD of cm px -1 , and reference velocities (obtained using a current meter) between 0 and ~2 m s -1 (Strelnikova et al., 2020). Basento river video was captured at the lowest flight altitude and, despite also being captured in the lower resolution, had the GSD of 0.5 cm px -1 with the average surface velocity of 0.40 m s -1 (Dal Sasso et al., 2020;Pizarro et al., 2020). Considering the results presented in the 655 previous section, stabilisation error of even a couple pixels per frame could potentially induce significant errors in estimated instantaneous velocities. Another important aspect of stabilisation is preventing the aggregation of camera motion (positional drift) over the course of the video. For that reason, motion in the stabilised datasets has been estimated both relative to the initial frame (SVF and d(RMSD) metrics) and relative to the previous frame in the sequence (d(RMSD) metric only) in order to identify which tools are susceptible to such error aggregation and to which extent. 660 Using SVFs as a metric of image stabilisation quality has the advantage of providing information about spatial distribution and type of residual motion. This approach, however, has high complexity and is unsuitable for processing a large number of frames. Frame-to-frame feature displacements in the opposite directions may compensate each other during averaging, creating a false impression of stability. A median-based generalisation is more robust than averaging, but there can still be disparities between the median velocity magnitudes and the velocity vectors calculated as median values of u and v vector components. 665 Aggregation of SVFs can lead to meaningful conclusions with regards to the type and the magnitude of apparent motion in a frame sequence only if motion type and direction do not vary greatly across the frames. Frame sequences can be divided into parts characterised by the same type of apparent motion, and SVFs may be aggregated for each of the sub-sequences. However, this only increases the complexity of using SVFs as an evaluation metric.
Considering the proposed d(RMSD) metric, it was demonstrated that it can be effectively used for estimation of displacements 670 of up to 4 -5 pixels. It is important to note that the conclusions regarding this metric presented in Sect. 2.5.2 are derived for the case studies described in this paper and should not be generalised. However, the methodology can be applied to any image set if suitable validation points can be obtained, which is indicated by the high R 2 value. It is worthwhile noting that unlike the SVF analysis, relationship d(RMSD) does not provide any detail on the character of the residual motiontranslation, rotation, scale, or a combination of either. Finally, the proposed metric assumes that no significant changes in brightness are present 675 during the video, as such changes would affect the RMSD score even if no actual displacements are present. Since the videos from the selected case studies were relatively short (up to 38 seconds), no global or local changes in brightness were observed.
Considering that FFT-CUAS uses the cross-correlation code of the PIVlab to produce stabilised videos, it was possible that PIVlab-based SVF analysis results could be biased towards this stabilisation tool. Therefore, the stabilisation accuracies of all tools were verified using the relationship between pixel intensity RMSD and pixel displacement d for several validation points 680 in the Kolubara and Alpine river videos. While the results seem to confirm the presence of such SVF analysis bias towards FFT-CUAS, its reported accuracy according to d(RMSD) is still objectively high and comparable with other best-performing stabilisation tools. The bias of SVF approach using PIVlab towards other tools was not evident in this research, and the SVF results are generally comparable with d(RMSD) metric. Performance of any given stabilisation tool seems to depend on its https://doi.org/10.5194/hess-2021-112 Preprint. Discussion started: 12 April 2021 c Author(s) 2021. CC BY 4.0 License.
implementation in the given tools as well as the feature detection/tracking algorithm. This is most obvious when comparing 685 the results between the two approaches based on cross-correlation techniques -FFT-CUAS and FFT-DCH. While both are based on 2D cross-correlation algorithms, the implementation of the FFT-CUAS is more flexible towards different ground and flight conditions. Similar discussion can be made for FAST and AKAZEwhile both rely on automatic feature detection algorithms, the implementation of the state-of-the-art AKAZE feature detection algorithm in the FlowVeloTool is supported by an optimised feature matching approach which enables more consistent and accurate stabilisation. This conclusion is further 690 reinforced by the fact that several feature tracking metrics have been able to deliver similar assessments of stabilisation accuracy, regardless of different underlying methodologies.
Considering the first two case studies, which aim to depict the common UAS velocimetry conditions, four tools have been able to provide adequate results: FFT-CUAS, SSIMS, KLT-IV, and AKAZE. The automatic tool AKAZE, implemented by the FlowVeloTool, demonstrated the highest median stabilisation accuracy of all the investigated tools. The abundance of 695 static features in the Kolubara and Alpine cases can potentially favour automatic feature detection algorithms if such features are evenly spread out in the ROI. No constraints were imposed on the FAST and AKAZE with regards to the areas in which the features can be detected, which allowed those tools to potentially use validation points for the estimation of the optimal transformation matrix, as well as the central islands from the Alpine case. While this degree of freedom could potentially create a bias towards such tools in the d(RMSD) analysis, it was assumed that such bias is relatively low due to the sheer number of 700 features detected by both FAST and AKAZE, and this was not considered in this research.
In the group of manual feature selection tools, FFT-CUAS, SSIMS, and KLT-IV provided accurate stabilisation with subpixel median accuracy, but no tool was dominantly more accurate than the others when both cases are considered. KLT-IV had proven to be capable of consistent sub-0.5 px accuracy in cases with low interframe motion and low camera tilt such as in the demonstrate that verification point V1, located between the two elevated GCPs, is characterised by lower stability than the other control points. The same is confirmed by the SVF analysis: static features on the left island experienced more residual displacement regardless of the stabilisation tool used, with the exception of AKAZE. To alleviate such issues, it is https://doi.org/10.5194/hess-2021-112 Preprint. Discussion started: 12 April 2021 c Author(s) 2021. CC BY 4.0 License. recommended that stabilisation, whenever possible, is performed with the use of static features with the same elevation. In cases where this principle cannot be followed, estimating the transformation matrix by using a higher number of features than 720 minimal (for the chosen transformation method) can limit the extent of such errors. This can potentially explain the high accuracy of AKAZE in the Alpine case where it used approx. 400 features to estimate the transformation matrix.
The distribution of GCPs and/or static features should also be considered. As a rule of thumb, the GCPs should be positioned as close to the water surface as practically possible in order to limit the parallax effectif the water surface and the plane holding the GCP are on different elevations, motion of the camera can introduce a parallax effect demonstrated by the apparent 725 motion of the two planes relative to each other. The intensity of this effect depends mostly on the ratio of distances between the UAS and the water surface and the UAS and the GCPs, and thus can also be limited by operating the UAS at higher flight altitudes. Additionally, such effects depend on the focal length of the camerashort focal lengths are more susceptible to the parallax effect. With tools relying on manual feature selection, it is beneficial to choose only the features closest to the water surface as this would substantially reduce the parallax effect. If the features are automatically selected from the entire image, 730 there is no guarantee that they will be selected from the same elevation, unless some constraints are imposed with regards to areas in which the features are detected.
In the Basento river video, which aims to demonstrate a complex camera motion, tools based on manual feature selection (with the exception of FFT-DCH and KLT-IV, see Table 6) have performed significantly better than automatic tools, and have been able to provide adequate stabilisation accuracy. Both automatic tools, FAST and AKAZE, have been unable to provide 735 adequate results throughout the entire video. What appeared to hamper the robustness of the automatic tools is the sudden drop in the number of features on one of the riverbanks. Even when there appeared to be a cluster of remaining features on the other riverbank, those were not adequately spread out across the image to ensure accurate stabilisation. The issue with the approaches in question is that they appear to require a significant portion of the image to contain adequately (evenly) distributed static features in order to generate a reliable transformation matrix, because there is no a priori guarantee that the selected features 740 are actually static. The criteria for the selection of features using FAST and AKAZE is the distinctiveness of the feature in the neighbouring area. This issue is also evident, to a lesser degree, in the results of KLT-IV, which produced jittery camera motion once the features on the left bank were scarce, as this implementation allows some degree of freedom in feature selectionthe user can only select the small window where the static feature is located while the actual selection is performed using Hessian eigenvalues. Due to these limitations, automatic tools should be used when a significant portion of the frames are covered by 745 static features at both riverbanks, in order to ensure reliable transformation. However, we have not investigated the requirements for the recommended minimum ratio of static areas (e.g., riverbanks) around non-static areas (e.g., water surface) for automatic feature selection approaches, as such investigation would merit dedicated research.
This indicates that manual feature selection tools are more likely to perform robustly in complex conditions, as they allow the operator to heuristically choose those static features which are present throughout the entire video and which are relatively 750 evenly spread out around the ROI. In a way, tools using automatic feature selection resemble hunting with a shotgunsimple and spread out, while those using manual selection resemble a sniper riflemore manual input but focused. However, this https://doi.org/10.5194/hess-2021-112 Preprint. Discussion started: 12 April 2021 c Author(s) 2021. CC BY 4.0 License. says nothing of the supremacy of one approach over the other until we know what is being hunted and from which distance.
When the previously described conditions are met, state-of-the-art automatic feature selection tools coupled with efficient and accurate feature matching are likely to achieve higher stabilisation accuracy, while the manual feature selection approaches 755 are somewhat less accurate, but can ensure stabilisation in complex flight and ground conditions. Since no tool provided vastly superior stabilisation accuracy in all three case studies, other aspects of the investigated tools have been summarised in Table 7. Apart from the stabilisation accuracy and robustness, we propose the following consideration when choosing the optimal tool, in order of importance: 1. Complexity: The most efficient tools in terms of required processing time were FFT-CUAS and SSIMS. A general 760 conclusion is that the computational (time) complexity of automatic tools can be several times higher than with manual tools due to the requirements for a higher number of features for obtaining adequate results, which should be considered when long high-resolution videos are stabilised. However, the number of required input parameters is lower for automatic tools which is why they require somewhat less operator experience for ensuring adequate results.
Specialised stabilisation tools used in this research have also been easier to use than off-the-shelf software Blender 765 (with both manual and automatic feature selection). Finally, off-the-shelf software Blender was found to require more computer resources, due to its general-purpose nature and broader spectrum of capabilities.  Table A1 in the Appendices, automatic tools require less user input in order to provide adequate results. Some tools such as FFT-CUAS and SSIMS offer a choice of different image transformation methods, which 780 can be beneficial for more experienced users. Finally, while some tools are currently only available as console applications -FFT-CUAS and FASTother also deliver graphical interfaces to enhance the overall user experience: Stabilisation tools were compared quantitatively for the first two case studies and qualitatively for the case study in challenging conditions. Quantitative comparison was based on two metrics: SVF analysis illustrated the character and intensity of the spatial distribution of residual motion of static features in the stabilised images; pixel-intensity root mean square differences (RMSD) were used for the estimation of the temporal distribution of residual motion.
The performance of AKAZE, FFT-CUAS, KLT-IV and SSIMS was consistent in the first two case studies. The average 795 residual displacement of the static features after stabilisation was in a subpixel range, mostly below 0.5 px. In the case study where riverbanks had large differences in elevation, more residual motion of static features was observed in the sub-region where the difference in elevation between the riverbanks and the water surface was higher. Thus, this study confirmed the importance of placing the GCPs used for 2D stabilisation on the same elevation as the water surface or as close to it as possible.
In the third case study, FFT-CUAS and SSIMS stabilised the video adequately and completely. In both cases, stabilisation 800 with Blender led to a reduction of contrast in the output images, reducing the visibility of traceable features on the water surface. The influence of colour grading issues induced by Blender on image velocimetry results requires further analysis.
Overall, approaches with automatic feature selection, including the additionally tested Blender/A, were found to be less robust in complex video recording conditions. The comparison presented in this paper did not aim at finding the best stabilisation tool for every occasion, but rather at the 805 analysis of the limitations of freely available tools, juxtaposing their performance in different test conditions. Future research will aim at evaluating the influence of image stabilisation quality, including the magnitude and the type of residual apparent motion, on the image velocimetry results in different flow conditions.