Framework for developing hybrid process-driven, artificial neural network and regression models for salinity prediction in river systems
Abstract. Salinity modelling in river systems is complicated by a number of processes, including in-stream salt transport and various mechanisms of saline accession that vary dynamically as a function of water level and flow, often at different temporal scales. Traditionally, salinity models in rivers have either been process- or data-driven. The primary problem with process-based models is that in many instances, not all of the underlying processes are fully understood or able to be represented mathematically. There are also often insufficient historical data to support model development. The major limitation of data-driven models, such as artificial neural networks (ANNs) in comparison, is that they provide limited system understanding and are generally not able to be used to inform management decisions targeting specific processes, as different processes are generally modelled implicitly. In order to overcome these limitations, a generic framework for developing hybrid process and data-driven models of salinity in river systems is introduced and applied in this paper. As part of the approach, the most suitable sub-models are developed for each sub-process affecting salinity at the location of interest based on consideration of model purpose, the degree of process understanding and data availability, which are then combined to form the hybrid model. The approach is applied to a 46 km reach of the Murray River in South Australia, which is affected by high levels of salinity. In this reach, the major processes affecting salinity include in-stream salt transport, accession of saline groundwater along the length of the reach and the flushing of three waterbodies in the floodplain during overbank flows of various magnitudes. Based on trade-offs between the degree of process understanding and data availability, a process-driven model is developed for in-stream salt transport, an ANN model is used to model saline groundwater accession and three linear regression models are used to account for the flushing of the different floodplain storages. The resulting hybrid model performs very well on approximately 3 years of daily validation data, with a Nash–Sutcliffe efficiency (NSE) of 0.89 and a root mean squared error (RMSE) of 12.62 mg L−1 (over a range from approximately 50 to 250 mg L−1). Each component of the hybrid model results in noticeable improvements in model performance corresponding to the range of flows for which they are developed. The predictive performance of the hybrid model is significantly better than that of a benchmark process-driven model (NSE = −0.14, RMSE = 41.10 mg L−1, Gbench index = 0.90) and slightly better than that of a benchmark data-driven (ANN) model (NSE = 0.83, RMSE = 15.93 mg L−1, Gbench index = 0.36). Apart from improved predictive performance, the hybrid model also has advantages over the ANN benchmark model in terms of increased capacity for improving system understanding and greater ability to support management decisions.