<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" specific-use="SMUR" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher">HESSD</journal-id>
<journal-title-group>
<journal-title>Hydrology and Earth System Sciences Discussions</journal-title>
<abbrev-journal-title abbrev-type="publisher">HESSD</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">Hydrol. Earth Syst. Sci. Discuss.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">1812-2116</issn>
<publisher><publisher-name></publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5194/hess-2024-284</article-id>
<title-group>
<article-title>Sensitivity of hydrological machine learning prediction accuracy to information quantity and quality</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Jeung</surname>
<given-names>Minhyuk</given-names>
<ext-link>https://orcid.org/0000-0001-5074-5973</ext-link>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Her</surname>
<given-names>Younggu</given-names>
<ext-link>https://orcid.org/0000-0003-3700-5115</ext-link>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Baek</surname>
<given-names>Sang-Soo</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Yoon</surname>
<given-names>Kwangsik</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
</contrib-group><aff id="aff1">
<label>1</label>
<addr-line>Department of Rural &amp; Biosystems Engineering (Brain Korea 21), Chonnam National University, Gwangju 61186, Republic of Korea</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<addr-line>Department of Agricultural and Biological Engineering / Tropical Research and Education Center, University of Florida, Homestead, Florida 33186, USA</addr-line>
</aff>
<aff id="aff3">
<label>3</label>
<addr-line>Department of Environmental Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea</addr-line>
</aff>
<pub-date pub-type="epub">
<day>07</day>
<month>10</month>
<year>2024</year>
</pub-date>
<volume>2024</volume>
<fpage>1</fpage>
<lpage>26</lpage>
<permissions>
<copyright-statement>Copyright: &#x000a9; 2024 Minhyuk Jeung et al.</copyright-statement>
<copyright-year>2024</copyright-year>
<license license-type="open-access">
<license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri"  xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p>
</license>
</permissions>
<self-uri xlink:href="https://hess.copernicus.org/preprints/hess-2024-284/">This article is available from https://hess.copernicus.org/preprints/hess-2024-284/</self-uri>
<self-uri xlink:href="https://hess.copernicus.org/preprints/hess-2024-284/hess-2024-284.pdf">The full text article is available as a PDF file from https://hess.copernicus.org/preprints/hess-2024-284/hess-2024-284.pdf</self-uri>
<abstract>
<p>Machine learning (ML) is now commonly employed as a tool for hydrological prediction due to recent advances in computing resources and increases in data volume. The prediction accuracy of ML (or data-driven) modeling is known to be improved through training with additional data; however, the improvement mechanism needs to be better understood and documented. This study explores the connection between the amount of information contained in the data used to train an ML model and the model&amp;rsquo;s prediction accuracy. The amount of information was quantified using Shannon&amp;rsquo;s information theory, including marginal and transfer entropy. Three ML models were trained to predict the flow discharge, sediment, total nitrogen, and total phosphorus loads of four watersheds. The amount of information contained in the training data was increased by sequentially adding weather data and the simulation outputs of uncalibrated and/or calibrated mechanistic (or theory-driven) models. The reliability of training data was considered a surrogate of information quality, and accuracy statistics were used to measure the quality (or reliability) of the uncalibrated and calibrated theory-driven modeling outputs to be provided as training data for ML modeling. The results demonstrated that the prediction accuracy of hydrological ML modeling depends on the quality and quantity of information contained in the training data. The use of all types of training data provided the best hydrological ML prediction accuracy. ML models trained only with weather data and calibrated theory-driven modeling outputs could most efficiently improve accuracy in terms of information use. This study thus illustrates how a theory-driven approach can help improve the accuracy of data-driven modeling by providing quality information about a system of interest.</p>
</abstract>
<counts><page-count count="26"/></counts>
</article-meta>
</front>
<body/>
<back>
</back>
</article>