Advancing flow duration curve prediction in ungauged basins using machine learning and deep learning
Abstract. The flow duration curve (FDC) represents the distribution of streamflow, providing vital information for managing river systems. Constructing FDC is especially challenging in ungauged basins where streamflow data are lacking. This study addresses key gaps by utilizing machine learning and deep learning models to predict FDC in ungauged basins. The objectives include: (a) identifying influential hydrologic, meteorological, and topographic factors, (b) evaluating various combinations of predictor variables, (c) assessing the effects of different precipitation metrics on flow predictions, and (d) comparing ML and DL model performance. We developed and evaluated random forest (RF), deep neural network (DNN), support vector regression (SVR), and elastic net regression (ENR) models using historical data from 140 streamflow stations. Feature importance analysis revealed that watershed area and precipitation were the key factors for high discharge percentiles, whereas land use and basin characteristics gained greater importance for medium and low flows. Scenario analysis showed that combining all variables yielded the highest accuracy in predicting FDC. Different precipitation metrics had minimal impact on streamflow predictions, indicating that other factors played a more significant role. The DNN outperformed RF, SVR, and ENR in predicting low (Q95), medium (Q50), and high flows (Q5), achieving an average coefficient of determination that was 8.03 % higher, a root mean square error that was 227.4 % lower on average, and a standard deviation that was 46.4 % lower. This study demonstrates the effectiveness of advanced ML and DL approaches for predicting FDC in ungauged basins, offering a foundation for advancing hydrological prediction.