<p>Design flood estimation is a fundamental task in hydrology. In this research, we propose a machine learning based approach to estimate design floods globally. This approach mainly involves three stages: (i) estimating at-site flood frequency curve for global gauging stations by the Anderson-Darling test and Bayesian MCMC method; (ii) clustering these stations into subgroups by a K-means model based on twelve globally available catchment descriptors, and (iii) developing a regression model in each subgroup for regional design flood estimation using the same descriptors. A total of 11793 stations globally were selected for model development and three widely used regression models were compared for design flood estimation. The results showed that: (1) the proposed approach achieved the highest accuracy for design flood estimation when using all twelve descriptors for clustering; and the performance of regression was improved by considering more descriptors during the training and validation; (2) a support vector machine regression provide the highest prediction performance among all regression models tested, with root mean square normalised error of 0.708 for 100-year return period flood estimation; (3) 100-year design flood in tropical, arid, temperate, cold and polar climate zones could be reliably estimated with the relative mean relative biases (RBIAS) of −0.199, −0.233, −0.169, 0.179 and −0.091 respectively; (4) This machine learning based approach shows considerable improvement over the index-flood based method introduced by Smith et al. (2015, <a href=" https://doi.org/10.1002/2014WR015814"target="_blank">https://doi.org/10.1002/2014WR015814</a>) for the design flood estimation at global scales; and the average RBIAS in estimation is less than 18 % for 10, 20, 50 and 100-year design floods. We conclude that the proposed approach is a valid method to estimate design floods anywhere on the global river network, improving our prediction of the flood hazard, especially in ungauged areas.</p>