Articles | Volume 30, issue 9
https://doi.org/10.5194/hess-30-2651-2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/hess-30-2651-2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Introducing the Model Fidelity Metric (MFM) for robust and diagnostic land surface model evaluation
Zezhen Wu
School of Mathematics (Zhuhai), Sun Yat-sen University, Zhuhai, China
Zhongwang Wei
CORRESPONDING AUTHOR
School of Atmospheric Sciences, Sun Yat-sen University, Zhuhai, China
Xingjie Lu
School of Atmospheric Sciences, Sun Yat-sen University, Zhuhai, China
Nan Wei
School of Atmospheric Sciences, Sun Yat-sen University, Zhuhai, China
Lu Li
School of Atmospheric Sciences, Sun Yat-sen University, Zhuhai, China
Shupeng Zhang
School of Atmospheric Sciences, Sun Yat-sen University, Zhuhai, China
Hua Yuan
School of Atmospheric Sciences, Sun Yat-sen University, Zhuhai, China
Shaofeng Liu
School of Atmospheric Sciences, Sun Yat-sen University, Zhuhai, China
Yongjiu Dai
CORRESPONDING AUTHOR
School of Atmospheric Sciences, Sun Yat-sen University, Zhuhai, China
Related authors
No articles found.
Tingting Wu, Shupeng Zhang, Xiaofan Yang, and Yongjiu Dai
EGUsphere, https://doi.org/10.5194/egusphere-2026-2275, https://doi.org/10.5194/egusphere-2026-2275, 2026
This preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).
Short summary
Short summary
We coupled a land surface model (CoLM) with an analysis tool (PSUADE) to evaluate processes controlling water table depth (WTD) through sensitivity analysis and parameter optimization. A small set of sensitive parameters governing WTD was identified, showing subsurface runoff controls average groundwater levels, while soil properties govern variability. A stepwise calibration was then suggested to improve accuracy. This framework offers a transferable pathway to enhance groundwater simulation.
Aoqi Sun, Wenjie Xu, Enze Ma, Hua Yuan, and Chen Yang
EGUsphere, https://doi.org/10.5194/egusphere-2026-1102, https://doi.org/10.5194/egusphere-2026-1102, 2026
This preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).
Short summary
Short summary
Groundwater is often viewed as a hidden reserve that supports evapotranspiration and streamflow during dry periods. We show that sustained warming and greening can weaken this buffering role. As groundwater levels decline, links between shallow and deeper stores reorganize, reducing older groundwater inputs to streams and evapotranspiration. Over time, water cycling shifts toward shallower, faster pathways, potentially lowering system resilience and predictability under long-term climate stress.
Chen Yang, Aoqi Sun, Shupeng Zhang, Yongjiu Dai, Stefan Kollet, and Reed Maxwell
Geosci. Model Dev., 19, 1849–1866, https://doi.org/10.5194/gmd-19-1849-2026, https://doi.org/10.5194/gmd-19-1849-2026, 2026
Short summary
Short summary
Groundwater plays a key role in land–atmosphere water and energy exchange, yet it is often simplified in large-scale Earth system models. We review 20 years of efforts to couple the groundwater model ParFlow with land surface and atmospheric models, showing how groundwater dynamics shape terrestrial fluxes. We also present an updated coupling framework that enhances model performance and flexibility, and outline a modular strategy to guide future development.
Wenhong Wang, Shiao Feng, Yonggen Zhang, Zhongwang Wei, Jianzhi Dong, Lutz Weihermüller, Cong-Qiang Liu, and Harry Vereecken
Earth Syst. Sci. Data, 18, 1061–1088, https://doi.org/10.5194/essd-18-1061-2026, https://doi.org/10.5194/essd-18-1061-2026, 2026
Short summary
Short summary
Current soil moisture data often suffers from gaps or errors. We combined the long-term coverage of ERA5-Land with the high accuracy of SMAP (Soil Moisture Active Passive) satellites to create a corrected global moisture dataset spanning 1950–2025. Validated against 3.8 million ground measurements, our product reduces errors by ~ 25 % in the modern period (2015–2020) and maintains ~ 20 % improvement historically (1960–2015). This reliable, daily 75-year record is essential for monitoring long-term climate trends and droughts.
Wanyi Lin, Hua Yuan, Wenzong Dong, Zhuo Liu, Jiayi Xiang, Xinran Yu, Shupeng Zhang, Zhongwang Wei, and Yongjiu Dai
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2026-21, https://doi.org/10.5194/essd-2026-21, 2026
Preprint under review for ESSD
Short summary
Short summary
Land surface models require information on vegetation composition and seasonal leaf area dynamics, yet long-term high-resolution datasets remain limited. We present a global dataset of plant functional types and corresponding leaf area index for 1985–2020, derived from multiple satellite observations. The dataset reduces uncertainties in vegetation representation and improves the description of phenological dynamics, supporting more realistic land surface and climate modeling.
Zhongwang Wei, Qingchen Xu, Fan Bai, Xionghui Xu, Zixin Wei, Wenzong Dong, Hongbin Liang, Nan Wei, Xingjie Lu, Lu Li, Shupeng Zhang, Hua Yuan, Laibao Liu, and Yongjiu Dai
Geosci. Model Dev., 18, 6517–6540, https://doi.org/10.5194/gmd-18-6517-2025, https://doi.org/10.5194/gmd-18-6517-2025, 2025
Short summary
Short summary
Land surface models are used for simulating how Earth's surface interacts with the atmosphere. As models grow more complex and detailed, researchers need better tools to evaluate their performance. OpenBench, a new software system that makes the evaluation process more comprehensive and efficient, stands out by incorporating various factors and working with data at any scale, enabling scientists to incorporate new types of models and measurements as our understanding of Earth's systems evolves.
Shuyang Guo, Yongjiu Dai, Hua Yuan, and Hongbin Liang
The Cryosphere, 19, 3553–3570, https://doi.org/10.5194/tc-19-3553-2025, https://doi.org/10.5194/tc-19-3553-2025, 2025
Short summary
Short summary
The Snow, Ice, and Aerosol Radiation Model version 4 has only been used to evaluate bare-ice albedo in land surface models, with necessary ice property data lacking quality control. We integrated this model into our land surface model and improved bare-ice properties using quality-controlled satellite data. Our findings show regional warming and reduced snow cover in Greenland’s bare-ice region, driven by changes in bare-ice properties through bare-ice–snow albedo feedback.
Shulei Zhang, Hongbin Liang, Fang Li, Xingjie Lu, and Yongjiu Dai
Hydrol. Earth Syst. Sci., 29, 3119–3143, https://doi.org/10.5194/hess-29-3119-2025, https://doi.org/10.5194/hess-29-3119-2025, 2025
Short summary
Short summary
This study enhances irrigation modeling in the Common Land Model by capturing the full irrigation process, detailing water supplies from various sources, and enabling bidirectional coupling between water demand and supply. The proposed model accurately simulates irrigation water withdrawals, energy fluxes, river flow, and crop yields. It offers insights into irrigation-related climate impacts and water scarcity, contributing to sustainable water management and improved Earth system modeling.
Weilin Liao, Yanman Li, Xiaoping Liu, Yuhao Wang, Yangzi Che, Ledi Shao, Guangzhao Chen, Hua Yuan, Ning Zhang, and Fei Chen
Earth Syst. Sci. Data, 17, 2535–2551, https://doi.org/10.5194/essd-17-2535-2025, https://doi.org/10.5194/essd-17-2535-2025, 2025
Short summary
Short summary
The currently available urban canopy parameter (UCP) datasets are limited to just a few cities for urban climate simulations by the Weather Research and Forecasting (WRF) model. To address this gap, we develop a global 1 km spatially continuous UCP dataset (GloUCP) which provides superior spatial coverage and higher accuracy in capturing urban morphology across diverse regions. It has great potential to support further advancements in urban climate modeling and related applications.
Chen Yang, Zitong Jia, Wenjie Xu, Zhongwang Wei, Xiaolang Zhang, Yiguang Zou, Jeffrey McDonnell, Laura Condon, Yongjiu Dai, and Reed Maxwell
Hydrol. Earth Syst. Sci., 29, 2201–2218, https://doi.org/10.5194/hess-29-2201-2025, https://doi.org/10.5194/hess-29-2201-2025, 2025
Short summary
Short summary
We developed the first high-resolution, integrated surface water–groundwater hydrologic model of the entirety of continental China using ParFlow. The model shows good performance in terms of streamflow and water table depth when compared to global data products and observations. It is essential for water resources management and decision-making in China within a consistent framework in the changing world. It also has significant implications for similar modeling in other places in the world.
Gaosong Shi, Wenye Sun, Wei Shangguan, Zhongwang Wei, Hua Yuan, Lu Li, Xiaolin Sun, Ye Zhang, Hongbin Liang, Danxi Li, Feini Huang, Qingliang Li, and Yongjiu Dai
Earth Syst. Sci. Data, 17, 517–543, https://doi.org/10.5194/essd-17-517-2025, https://doi.org/10.5194/essd-17-517-2025, 2025
Short summary
Short summary
In this study, we developed the second version of China's high-resolution soil information grid using legacy soil samples and advanced machine learning. This version predicts over 20 soil properties at six depths, providing accurate soil variation maps across China. It outperforms previous versions and global products, offering valuable data for hydrological and ecological analyses and Earth system modelling, enhancing our understanding of soil roles in environmental processes.
Jiahao Shi, Hua Yuan, Wanyi Lin, Wenzong Dong, Hongbin Liang, Zhuo Liu, Jianxin Zeng, Haolin Zhang, Nan Wei, Zhongwang Wei, Shupeng Zhang, Shaofeng Liu, Xingjie Lu, and Yongjiu Dai
Earth Syst. Sci. Data, 17, 117–134, https://doi.org/10.5194/essd-17-117-2025, https://doi.org/10.5194/essd-17-117-2025, 2025
Short summary
Short summary
Flux tower data are widely recognized as benchmarking data for land surface models, but insufficient emphasis on and deficiency in site attribute data limits their true value. We collect site-observed vegetation, soil, and topography data from various sources. The final dataset encompasses 90 sites globally, with relatively complete site attribute data and high-quality flux validation data. This work has provided more reliable site attribute data, benefiting land surface model development.
Yangzi Che, Xuecao Li, Xiaoping Liu, Yuhao Wang, Weilin Liao, Xianwei Zheng, Xucai Zhang, Xiaocong Xu, Qian Shi, Jiajun Zhu, Honghui Zhang, Hua Yuan, and Yongjiu Dai
Earth Syst. Sci. Data, 16, 5357–5374, https://doi.org/10.5194/essd-16-5357-2024, https://doi.org/10.5194/essd-16-5357-2024, 2024
Short summary
Short summary
Most existing building height products are limited with respect to either spatial resolution or coverage, not to mention the spatial heterogeneity introduced by global building forms. Using Earth Observation (EO) datasets for 2020, we developed a global height dataset at the individual building scale. The dataset provides spatially explicit information on 3D building morphology, supporting both macro- and microanalysis of urban areas.
Liqing Peng, Justin Sheffield, Zhongwang Wei, Michael Ek, and Eric F. Wood
Earth Syst. Dynam., 15, 1277–1300, https://doi.org/10.5194/esd-15-1277-2024, https://doi.org/10.5194/esd-15-1277-2024, 2024
Short summary
Short summary
Integrating evaporative demand into drought indicators is effective, but the choice of method and the effectiveness of surface features remain undocumented. We evaluate various methods and surface features for predicting soil moisture dynamics. Using minimal ancillary information alongside meteorological and vegetation data, we develop a simple land-cover-based method that improves soil moisture drought predictions, especially in forests, showing promise for better real-time drought forecasting.
Bangjun Cao, Yaping Shao, Xianyu Yang, Xin Yin, and Shaofeng Liu
Atmos. Chem. Phys., 24, 275–285, https://doi.org/10.5194/acp-24-275-2024, https://doi.org/10.5194/acp-24-275-2024, 2024
Short summary
Short summary
Our novel scheme enhances large-eddy simulations (LESs) for atmosphere–land interactions. It couples LES subgrid closure with Monin–Obukhov similarity theory (MOST), overcoming MOST's limitations. Validated over diverse land surfaces, our approach outperforms existing methods, aligning well with field measurements. Robustness is demonstrated across varying model resolutions. MOST's influence strengthens with decreasing grid spacing, particularly for sensible heat flux.
Qingliang Li, Gaosong Shi, Wei Shangguan, Vahid Nourani, Jianduo Li, Lu Li, Feini Huang, Ye Zhang, Chunyan Wang, Dagang Wang, Jianxiu Qiu, Xingjie Lu, and Yongjiu Dai
Earth Syst. Sci. Data, 14, 5267–5286, https://doi.org/10.5194/essd-14-5267-2022, https://doi.org/10.5194/essd-14-5267-2022, 2022
Short summary
Short summary
SMCI1.0 is a 1 km resolution dataset of daily soil moisture over China for 2000–2020 derived through machine learning trained with in situ measurements of 1789 stations, meteorological forcings, and land surface variables. It contains 10 soil layers with 10 cm intervals up to 100 cm deep. Evaluated by in situ data, the error (ubRMSE) ranges from 0.045 to 0.051, and the correlation (R) range is 0.866-0.893. Compared with ERA5-Land, SMAP-L4, and SoMo.ml, SIMI1.0 has higher accuracy and resolution.
Yi Nan, Zhihua He, Fuqiang Tian, Zhongwang Wei, and Lide Tian
Hydrol. Earth Syst. Sci., 26, 4147–4167, https://doi.org/10.5194/hess-26-4147-2022, https://doi.org/10.5194/hess-26-4147-2022, 2022
Short summary
Short summary
Tracer-aided hydrological models are useful tool to reduce uncertainty of hydrological modeling in cold basins, but there is little guidance on the sampling strategy for isotope analysis, which is important for large mountainous basins. This study evaluated the reliance of the tracer-aided modeling performance on the availability of isotope data in the Yarlung Tsangpo river basin, and provides implications for collecting water isotope data for running tracer-aided hydrological models.
Ziqi Lin, Yongjiu Dai, Umakant Mishra, Guocheng Wang, Wei Shangguan, Wen Zhang, and Zhangcai Qin
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2022-232, https://doi.org/10.5194/essd-2022-232, 2022
Manuscript not accepted for further review
Short summary
Short summary
Spatial soil organic carbon (SOC) data is critical for predictions in carbon climate feedbacks and future climate trends, but no conclusion has yet been reached on which dataset to be used for specific purposes. We evaluated the SOC estimates from five widely used global soil datasets and a regional permafrost dataset, and identify uncertainties of SOC estimates by region, biome, and data sources, hoping to help improve SOC/soil data in the future.
Shuang Ma, Lifen Jiang, Rachel M. Wilson, Jeff P. Chanton, Scott Bridgham, Shuli Niu, Colleen M. Iversen, Avni Malhotra, Jiang Jiang, Xingjie Lu, Yuanyuan Huang, Jason Keller, Xiaofeng Xu, Daniel M. Ricciuto, Paul J. Hanson, and Yiqi Luo
Biogeosciences, 19, 2245–2262, https://doi.org/10.5194/bg-19-2245-2022, https://doi.org/10.5194/bg-19-2245-2022, 2022
Short summary
Short summary
The relative ratio of wetland methane (CH4) emission pathways determines how much CH4 is oxidized before leaving the soil. We found an ebullition modeling approach that has a better performance in deep layer pore water CH4 concentration. We suggest using this approach in land surface models to accurately represent CH4 emission dynamics and response to climate change. Our results also highlight that both CH4 flux and belowground concentration data are important to constrain model parameters.
Yaoping Wang, Jiafu Mao, Mingzhou Jin, Forrest M. Hoffman, Xiaoying Shi, Stan D. Wullschleger, and Yongjiu Dai
Earth Syst. Sci. Data, 13, 4385–4405, https://doi.org/10.5194/essd-13-4385-2021, https://doi.org/10.5194/essd-13-4385-2021, 2021
Short summary
Short summary
We developed seven global soil moisture datasets (1970–2016, monthly, half-degree, and multilayer) by merging a wide range of data sources, including in situ and satellite observations, reanalysis, offline land surface model simulations, and Earth system model simulations. Given the great value of long-term, multilayer, gap-free soil moisture products to climate research and applications, we believe this paper and the presented datasets would be of interest to many different communities.
Cited articles
Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, https://doi.org/10.5194/hess-21-5293-2017, 2017.
Barber, C., Lamontagne, J., and Vogel, R.: Improved estimators of correlation and R2 for skewed hydrologic data, Hydrolog. Sci. J., 65, 87–101, https://doi.org/10.1080/02626667.2019.1686639, 2020.
Best, M. J., Pryor, M., Clark, D. B., Rooney, G. G., Essery, R. L. H., Ménard, C. B., Edwards, J. M., Hendry, M. A., Porson, A., Gedney, N., Mercado, L. M., Sitch, S., Blyth, E., Boucher, O., Cox, P. M., Grimmond, C. S. B., and Harding, R. J.: The Joint UK Land Environment Simulator (JULES), model description – Part 1: Energy and water fluxes, Geosci. Model Dev., 4, 677–699, https://doi.org/10.5194/gmd-4-677-2011, 2011.
Bhatti, S., Kroll, C., and Vogel, R.: Revisiting the Probability Distribution of Low Streamflow Series in the United States, J. Hydrol. Eng., 24, https://doi.org/10.1061/(ASCE)HE.1943-5584.0001844, 2019.
Cinkus, G., Mazzilli, N., Jourde, H., Wunsch, A., Liesch, T., Ravbar, N., Chen, Z., and Goldscheider, N.: When best is the enemy of good – critical evaluation of performance criteria in hydrological models, Hydrol. Earth Syst. Sci., 27, 2397–2411, https://doi.org/10.5194/hess-27-2397-2023, 2023.
Clark, M. P., Vogel, R. M., Lamontagne, J. R., Mizukami, N., Knoben, W. J. M., Tang, G., Gharari, S., Freer, J. E., Whitfield, P. H., Shook, K. R., and Papalexiou, S. M.: The Abuse of Popular Performance Metrics in Hydrologic Modeling, Water Resour. Res., 57, https://doi.org/10.1029/2020WR029001, 2021.
Clark, M. P., Knoben, W. J. M., Spieler, D., Gründemann, G. J., Thébault, C., Vásquez, N. A., Wood, A. W., Song, Y., Shen, C., Carney, S., and Werkhoven, K. van: Comment on Williams (2025): “Friends don't let friends use NSE or KGE for hydrologic model accuracy evaluation: A rant with data and suggestions for better practice”, Environ. Modell. Softw., 197, 106869, https://doi.org/10.1016/j.envsoft.2026.106869, 2026.
Dai, Y., Zeng, X., Dickinson, R., Baker, I., Bonan, G., Bosilovich, M., Denning, A., Dirmeyer, P., Houser, P., Niu, G., Oleson, K., Schlosser, C., and Yang, Z.: The Common Land Model, B. Am. Meteorol. Soc., 84, 1013–1023, https://doi.org/10.1175/BAMS-84-8-1013, 2003.
Entekhabi, D., Reichle, R. H., Koster, R. D., and Crow, W. T.: Performance metrics for soil moisture retrievals and application requirements, J. Hydrometeorol., 11, 832–840, https://doi.org/10.1175/2010JHM1223.1, 2010.
Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., and Taylor, K. E.: Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization, Geosci. Model Dev., 9, 1937–1958, https://doi.org/10.5194/gmd-9-1937-2016, 2016.
Fu, T. and Zhang, C.: Towards a generic model evaluation metric for non-normally distributed measurements in water quality and ecosystem models, Ecol. Inform., 80, https://doi.org/10.1016/j.ecoinf.2024.102470, 2024.
Garcia, F., Folton, N., and Oudin, L.: Which objective function to calibrate rainfall-runoff models for low-flow index simulations?, Hydrolog. Sci. J., 62, 1149–1166, https://doi.org/10.1080/02626667.2017.1308511, 2017.
Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling, J. Hydrol., 377, 80–91, https://doi.org/10.1016/j.jhydrol.2009.08.003, 2009.
Kling, H., Fuchs, M., and Paulin, M.: Runoff conditions in the upper Danube basin under an ensemble of climate change scenarios, J. Hydrol., 424, 264–277, https://doi.org/10.1016/j.jhydrol.2012.01.011, 2012.
Klotz, D., Gauch, M., Kratzert, F., Nearing, G., and Zscheischler, J.: Technical Note: The divide and measure nonconformity – how metrics can mislead when we evaluate on different data partitions, Hydrol. Earth Syst. Sci., 28, 3665–3673, https://doi.org/10.5194/hess-28-3665-2024, 2024.
Knoben, W. J. M., Freer, J. E., and Woods, R. A.: Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores, Hydrol. Earth Syst. Sci., 23, 4323–4331, https://doi.org/10.5194/hess-23-4323-2019, 2019.
Knoben, W. J. M., Raman, A., Gründemann, G. J., Kumar, M., Pietroniro, A., Shen, C., Song, Y., Thébault, C., van Werkhoven, K., Wood, A. W., and Clark, M. P.: Technical note: How many models do we need to simulate hydrologic processes across large geographical domains?, Hydrol. Earth Syst. Sci., 29, 2361–2375, https://doi.org/10.5194/hess-29-2361-2025, 2025.
Koutsoyiannis, D. and Montanari, A.: Bluecat: A Local Uncertainty Estimator for Deterministic Simulations and Predictions, Water Resour. Res., 58, https://doi.org/10.1029/2021WR031215, 2022.
Lamontagne, J. R., Barber, C. A., and Vogel, R. M.: Improved estimators of model performance efficiency for skewed hydrologic data, Water Resour. Res., 56, e2020WR027101, https://doi.org/10.1029/2020WR027101, 2020.
Legates, D. and McCabe, G.: Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation, Water Resour. Res., 35, 233–241, https://doi.org/10.1029/1998WR900018, 1999.
Liu, Y., Brown, J., Demargne, J., and Seo, D.: A wavelet-based approach to assessing timing errors in hydrologic predictions, J. Hydrol., 397, 210–224, https://doi.org/10.1016/j.jhydrol.2010.11.040, 2011.
Magyar, J. C. and Sambridge, M.: Hydrological objective functions and ensemble averaging with the Wasserstein distance, Hydrol. Earth Syst. Sci., 27, 991–1010, https://doi.org/10.5194/hess-27-991-2023, 2023.
Mahecha, M. D., Reichstein, M., Jung, M., Seneviratne, S. I., Zaehle, S., Beer, C., Braakhekke, M. C., Carvalhais, N., Lange, H., Le Maire, G., and Moors, E.: Comparing observations and process-based simulations of biosphere-atmosphere exchanges on multiple timescales, J. Geophys. Res., 115, G02003, https://doi.org/10.1029/2009JG001016, 2010.
Mathevet, T., Gupta, H., Perrin, C., Andréassian, V., and Le Moine, N.: Assessing the performance and robustness of two conceptual rainfall-runoff models on a worldwide sample of watersheds, J. Hydrol., 585, https://doi.org/10.1016/j.jhydrol.2020.124698, 2020.
Mizukami, N., Rakovec, O., Newman, A. J., Clark, M. P., Wood, A. W., Gupta, H. V., and Kumar, R.: On the choice of calibration metrics for “high-flow” estimation using hydrologic models, Hydrol. Earth Syst. Sci., 23, 2601–2614, https://doi.org/10.5194/hess-23-2601-2019, 2019.
Moriasi, D., Arnold, J., Van Liew, M., Bingner, R., Harmel, R., and Veith, T.: Model evaluation guidelines for systematic quantification of accuracy in watershed simulations, T. ASABE, 50, 885–900, https://doi.org/10.13031/2013.23153, 2007.
Nash, J. E. and Sutcliffe, J. V.: River flow forecasting through conceptual models part I – A discussion of principles, J. Hydrol., 10, 282–290, https://doi.org/10.1016/0022-1694(70)90255-6, 1970.
Newman, A. J., Clark, M. P., Sampson, K., Wood, A., Hay, L. E., Bock, A., Viger, R. J., Blodgett, D., Brekke, L., Arnold, J. R., Hopson, T., and Duan, Q.: Development of a large-sample watershed-scale hydrometeorological data set for the contiguous USA: data set characteristics and assessment of regional variability in hydrologic model performance, Hydrol. Earth Syst. Sci., 19, 209–223, https://doi.org/10.5194/hess-19-209-2015, 2015.
Newman, A. J., Sampson, K., Clark, M., Bock, A., Viger, R., Blodgett, D., Addor, N., and Mizukami, M.: CAMELS: Catchment Attributes and MEteorology for Large-sample Studies (1.2), Zenodo [data set], https://doi.org/10.5065/D6MW2F4D, 2022.
Pechlivanidis, I., Jackson, B., and Mcmillan, H.: The use of entropy as a model diagnostic in rainfall-runoff modelling, Int. Congr. Environ. Model. Softw., 2, 1780–1787, 2010.
Pechlivanidis, I. G., Jackson, B., McMillan, H., and Gupta, H.: Use of an entropy-based metric in multiobjective calibration to improve model performance, Water Resour. Res., 50, 8066–8083, https://doi.org/10.1002/2013WR014537, 2014.
Pizarro, A., Koutsoyiannis, D., and Montanari, A.: Combining uncertainty quantification and entropy-inspired concepts into a single objective function for rainfall-runoff model calibration, Hydrol. Earth Syst. Sci., 29, 4913–4928, https://doi.org/10.5194/hess-29-4913-2025, 2025.
Pool, S., Vis, M., and Seibert, J.: Evaluating model performance: towards a non-parametric variant of the Kling-Gupta efficiency, Hydrolog. Sci. J., 63, 1941–1953, https://doi.org/10.1080/02626667.2018.1552002, 2018.
Pushpalatha, R., Perrin, C., Le Moine, N., and Andréassian, V.: A review of efficiency criteria suitable for evaluating low-flow simulations, J. Hydrol., 420, 171–182, https://doi.org/10.1016/j.jhydrol.2011.11.055, 2012.
Refsgaard, J., van der Sluijs, J., Hojberg, A., and Vanrolleghem, P.: Uncertainty in the environmental modelling process – A framework and guidance, Environ. Modell. Softw., 22, 1543–1556, https://doi.org/10.1016/j.envsoft.2007.02.004, 2007.
Santos, L., Thirel, G., and Perrin, C.: Technical note: Pitfalls in using log-transformed flows within the KGE criterion, Hydrol. Earth Syst. Sci., 22, 4583–4591, https://doi.org/10.5194/hess-22-4583-2018, 2018.
Schaefli, B. and Gupta, H. V.: Do Nash values have value?, Hydrol. Process., 21, 2075–2080, https://doi.org/10.1002/hyp.6825, 2007.
Swain, M. J. and Ballard, D. H.: Color indexing, Int. J. Comput. Vision, 7, 11–32, https://doi.org/10.1007/BF00130487, 1991.
Tang, G., Wood, A., and Swenson, S.: On Using AI-Based Large-Sample Emulators for Land/Hydrology Model Calibration and Regionalization, Water Resour. Res., 61, https://doi.org/10.1029/2024WR039525, 2025.
Vrugt, J., de Oliveira, D., Schoups, G., and Diks, C.: On the use of distribution-adaptive likelihood functions: Generalized and universal likelihood functions, scoring rules and multi-criteria ranking, J. Hydrol., 615, https://doi.org/10.1016/j.jhydrol.2022.128542, 2022.
Wei, Z.: The Open Source Land Surface Model Benchmarking System, Zenodo [code], https://doi.org/10.5281/zenodo.15811122, 2025.
Wei, Z., Xu, Q., Bai, F., Xu, X., Wei, Z., Dong, W., Liang, H., Wei, N., Lu, X., Li, L., Zhang, S., Yuan, H., Liu, L., and Dai, Y.: OpenBench: a land model evaluation system, Geosci. Model Dev., 18, 6517–6540, https://doi.org/10.5194/gmd-18-6517-2025, 2025.
Williams, G.: Friends don't let friends use Nash-Sutcliffe Efficiency (NSE) or KGE for hydrologic model accuracy evaluation: A rant with data and suggestions for better practice, Environ. Modell. Softw., 194, https://doi.org/10.1016/j.envsoft.2025.106665, 2025.
Wu, Z.: wuzezhen5577/Model-Fidelity-Metric: Model Fidelity Metric: A robust and diagnostic metric for land surface model evaluation (1.0.1), Zenodo [code], https://doi.org/10.5281/zenodo.18523829, 2026.
Zhou, X., Yamazaki, D., Revel, M., Zhao, G., and Modi, P.: Benchmark Framework for Global River Models, J. Adv. Model. Earth Sy., 17, e2024MS004379, https://doi.org/10.1029/2024MS004379, 2025.
Short summary
Land surface models simulate exchanges at the Earth's surface. Traditional evaluation methods can be misleading because they may hide errors or be overly sensitive to outliers. We introduce the Model Fidelity Metric, which measures model performance in terms of accuracy, variability and distribution similarity. Tests with synthetic data and real streamflow observations show that this metric provides more stable and informative assessments of model performance.
Land surface models simulate exchanges at the Earth's surface. Traditional evaluation methods...