|Review of GrainNet.|
This revised manuscript has many improvements on issues like model architecture and training. Class acitivation maps also provide a very useful insight into how the model works. Overall, this method is very innovative and it produces results that are high quality and very difficult to achieve with other methods. However, the authors still do not clearly acknowledge the limitations of their method and this rests on 2 points: an unclear understanding of the logistical costs of acquiring sufficient data for GrainNet with a UAV and a false result for their so-called geographic cross validation.
First, the authors begin their response letter by stating that they do not see an issue with the acquisition of drone data at 0.25 cm of spatial resolution and characterise it as a ‘minor technical detail’. I will therefore clarify my comment with a worked example. Start with a 1 hectare (100x100 metre) bar a a unit sampling area. The project uses a DJI P4 pro, let’s simplify the problem by asuming a 90 degree FOV meaning that the image footprint is twice the flying altitude. The images were acquired at 16:9 aspect ratio with 5472x3648 pixels. From this we can derive that the drone was approximately 6.8 m above ground. Given that the method needs an orthomosaic, I will assume that the images are flown at 80% forward overlap with a 50% sidelap. The image height is 9.1 m. Between images the drone must move 20% of the image to get the 80% overlap. This is 1.82 meters. On the P4 pro and with the fastest SD card on the market, you need to leave abut 2s for the mage to write to disk, anything less and the drone will start missing images during the mission. So the optimal flight method is to get a slow continuous motion of the drone. 1.82 meters in 2s is 0.9 m/s. It will therefore take roughly 0.9 minutes to complete 1 line of 100m. With the images being 12m wide and a 50% sidelap, We need about 17 flight lines to cover 1 hectare. For a total flight time of about 15 minutes/hectare.
Now consider an alternative setup that is used to get grain size data for alternative texture mapping methods. In this case imagery acquired at 2-3 cm of spatial resolution is suitable. In this case, flying a P4 pro at 50 m altitude will deliver suitable imagery at about 2cm. At 50m, the footprint of 1 image is 100m x 56m. At 80 overlap and the same 2s interval between images, the drone flies at 5.6 m/s. Given the image width, we only need 2 lines to cover the hectare. Meaning that the total operation needs only 34 seconds/hectare.
Therefore, data acquisition for GrainNet requires drone operations that are 30x longer than for older methods. That is not trivial and readers deserve to know this fact.
The authors suggestion that magnification can solve the problem is incorrect. When you magnify you increase the focal length and thus reduce the image footprint, flight velocity for SfM acquisition remains the same. Whilst it is true that a higher resolution camera could indeed improve things, that trend is very slow. The current UAV market for science is now dominated by consumer, non-scientific, drones made by DJI. The simple reason is cost. The P4 pro resolution of 20 Mpix is already on the high side. The only way to improve the performance would be to use top of the line cameras that have high speed writing buses. For example, mounting a Canon EOS on a big drone like a Matrice 600 would indeed be much faster, but then you are talking of a 1 order magnitude increase in cost for drone equipment. Either way, the acquisition of appropriate data for GrainNet is a significant barrier to access.
The second issue is the geographic cross validation. My view is still that the authors approach is mistaken and unjustified in geomorphology literature. The authors state on line 564 that there is no strong correlation between grain sizes on the same rivers. This statement is not evidenced and it flies in the face of decades of fluvial geomorphology. It has long been known that grain size decreases exponentially with distance downstream with periodic discontinuities (Rice, 1999; Rice and Church, 1998, 2001). This was again observed in recent remote sensing studies(Carbonneau et al., 2005). So barring the incidence of a source of coarse grained material, two successive bars on the same river can be expected to have a similar grain size composed of similar material of the same source. So unless the authors can show that between each and every one of their sampling bars there is an new input of sediment, then we must expect that the majority of neighbouring bars in the dataset are similar and LOOCV is not an appropriate method. I again make the request that the authors revise this process to hold-out entire rivers.
This is critical because as it stands, this method does provide unprecedented data over a gravel surface, but as I show above, the logistic costs of data acquisition are an order of magnitude more in time or cost when compatred to older methods. If it turns out that the method does not generalise to new rivers, then local calibration will be needed at each acquisition thus increasing the total cost of the method. I do not doubt that in certain applications, such a large field effort will be justified in order to produce such high quality outputs, but the reader deserves to get a clear indication of these costs upfront.