HESS Opinions : Deep learning as a promising avenue toward knowledge discovery in water sciences

Recently, deep learning (DL) has emerged as a revolutionary and versatile tool transforming industry applications and generating new and improved capabilities for scientific discovery and model building. The adoption of DL in water science 20 has so far been gradual, but the related fields are now ripe for breakthroughs. This paper proposes that DL-based methods can open up a viable, complementary avenue toward knowledge discovery in hydrologic sciences. In the new avenue, machinelearning algorithms present competing hypotheses that are consistent with data for scientists to further evaluate. Interrogative studies are invoked to interpret DL models. In addition, we lay out several opinions shared by authors: (1) deep learning may bring forth transformative progress to the field of hydrology due to its ability to assimilate big data and identify commonalities 25 and differences; (2) the community may benefit greatly from a variety of shared datasets and open competitions; (3) big hydrologic data can be obtained via various ways including data compilation and working with citizen scientists, which offers the co-benefits of education and stakeholder engagement; (4) water sciences, and hydrology in particular, offer a unique set of challenges that can, in turn, stimulate advances in machine learning; and (5) An urgent need for research is hydrologycustomized methods for interpreting knowledge extracted by deep learning. 30 Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2018-168 Manuscript under review for journal Hydrol. Earth Syst. Sci. Discussion started: 9 April 2018 c © Author(s) 2018. CC BY 4.0 License.


Overview
Deep learning (DL), which has gained widespread attention since 2012, is a suite of tools centering around artfully-designed large-size artificial neural networks.Compared to non-deep networks, DL is characterized by the large size to accommodate the complexities of information contained in big data, multiple levels of hidden representations, the addition of unsupervised learning units, and effective, large-scale regularization techniques.As a foundational component of modern artificial 5 intelligence (AI), DL has made substantial strides in recent years and helped solve problems that have resisted AI for decades (LeCun et al., 2015).DL models have repeatedly been shown to outperform simpler models by large margins and generalize better to unseen instances (Schmidhuber, 2015;Shen, 2017).
Deep networks may be more robust than simpler models despite their large size, if they are regularized properly and are chosen based on validation errors in a two-stage approach (Kawaguchi et al., 2017).Effective regularization techniques include (i) 10 early stopping: monitor the training progress on a separate validation set and stop the training once validation metrics start to deteriorate; and/or (ii) novel regularization techniques such as dropout (Srivastava et al., 2014).DL models can be easier to train than previous networks, as their architectures and new stochastic gradient techniques (Kingma and Ba, 2014) address issues like vanishing gradient (Hochreiter, 1998).Training large networks as used today was computationally implausible until scientists started to exploit the parallel processing power of graphical processing units (GPUs).Nowadays new application-15 specific integrated circuits have also been created to specifically tackle DL, although DL architectures are rapidly evolving.
In contrast to many older-generation nonlinear regression and classification methods like Support Vector Machine (SVM) (Cortes and Vapnik, 1995), genetic programming (Koza, 1992), Classification and Regression Tree (CART) (Bae et al., 2010;Breiman et al., 1984) or random forest (Ho, 1995), just to name a few, deep networks are differentiable from outputs to inputs, giving them practical advantages in efficient parameter optimization via backpropagation (training).This efficiency, which is 20 shared by some other older-generation methods like non-deep neural networks and Gaussian Processes (Snelson and Ghahramani, 2006), etc., allows DL to be used as powerful engineering and scientific design tools, whereby the often complicated effect of inputs on output variables can be estimated in a data-driven way.Moreover, the differentiable nature allows for greater success for interpolation and mild extrapolation, contributing to the strong generalization capability of DL.
It has been shown that deep networks can continue to improve when the number of training instances (e.g., images) is increased 25 to hundreds of millions, albeit at a logarithmic rate (Sun et al., 2017).Simpler networks would have long stalled in performance prior to reaching this amount of data because they are unable to represent the complexity of the data.Lastly, like some oldergeneration methods, DL offers the possibility of transfer learning (Mesnil et al., 2012), where a complex deep model trained to perform a given task can be re-trained for a different but related purpose at a comparatively small computational cost.For DL, transfer learning is simple to implement: only the output layer needs to be re-trained, while the other network layers that 30 encode a deep representation of the input data are left intact.While DL has stimulated exciting advances in many disciplines and has become the method of choice in some areas, water sciences so far have only had a very limited set of DL applications.Despite scattered early reports of promising DL results (Fang et al., 2017;Laloy et al., 2017Laloy et al., , 2018;;Tao et al., 2016;Vandal et al., 2017;Zhang et al., 2018), water scientists seemed to have reservations about these new tools, perhaps with good reasoning.This opinion paper, endorsed by the cohort of authors, argues that there are many opportunities in water sciences where DL can help provide both stronger predictive capabilities and 5 a complementary avenue toward scientific discovery.Readers who are less familiar with machine learning or deep learning are referred to a companion review paper (Shen, 2017) (hereafter referred to as Shen17), which provides a more comprehensive and technical background.
We first voice the opinions that elements of a complementary machine learning-based scientific discovery avenue are taking shape, and this avenue should at least be considered for problems with large data (section 2).Then, we propose several ways 10 to accelerate this avenue (section 3).Finally, we argue that hydrology offers a unique set of challenges for DL research (section 4).

The emergence of a complementary avenue
We have witnessed the growth of three pillars that support a complementary research avenue utilizing deep learning: big hydrologic data, powerful machine learning algorithms, and interrogative methods to extract interpretable knowledge from the 15 trained networks.We discuss these aspects in the following sections.

With more data, opportunities arise
The fundamental supporting factor for emerging opportunities with DL is the growth of big hydrologic data.There are ever increasing amount of hydrologic data through remote sensing (see a summary in Srinivasan, (2013)) and data compilation.
Large available datasets include satellite-based data products of precipitation, surface soil moisture (Entekhabi, 2010; Jackson 20 et al., 2016;Mecklenburg et al., 2008), vegetation states and indices, e.g., (Knyazikhin et al., 1999), and derived evapotranspiration products (Mu et al., 2011), terrestrial water storage (Wahr et al., 2006), snowcover (Hall et al., 2006), and planned mission for streamflows (Pavelsky et al., 2014), etc.On the data compilation side, there are now compilations of geologic (Gleeson et al., 2014) and soil datasets; centralized management of streamflow and groundwater data in the United States, Europe, parts of South America and Asia, or globally for some large rivers (GRDC, 2017); water chemistry, 25 groundwater samples and other biogeophysical datasets.
One of the 10 Big Ideas for Future Investments from U.S. National Science Foundation is "Harnessing data for 21st-century science and engineering" (NSF, 2018).With these emerging datasets, DL models can be built and trained to learn features, organizational patterns and relationships and predict outputs given new input instances.However, we are not advocating a Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-168Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 9 April 2018 c Author(s) 2018.CC BY 4.0 License.whole transition to DL: not all problems can be suitably formulated as DL problems -they could be best tackled by specificallydesigned earlier-generation methods, and for many problems, there are just not enough data to train DL-based models.

DL: A big step forward
The field of hydrology has witnessed flows and ebbs of several generations of machine learning methods in the past few decades.From regularized linear regression (Tibshirani and Tibshirani, 1994) to Support Vector Regression (Drucker et al., 5 1996), from genetic programming (Koza, 1992) to artificial neural networks (Chang et al., 2014;Chen et al., 2018;Hsu et al., 1995Hsu et al., , 1997Hsu et al., , 2002)), from classification and regression tree to random forest, from Gaussian Process (Snelson and Ghahramani, 2006) to Radial Basis Function Network (Moradkhani et al., 2004), each algorithm offered useful solutions to a set of problems, but each also faces its own limitations.As a result, over time, some may have grown dispassionate about progress in machine learning, and some may have concerns about whether DL is a real progress or just a "hype".A frequent limitation with 10 conventional neural network study is that they are trained in a geographic region or site and typically cannot be transferred out of the training region.Large-size neural networks may be overfitted and are prohibitively expensive to train in terms of computation.
The progress brought forth by DL to the information technology industry is revolutionary (Section 4 in Shen17) and can no longer be ignored.Primary types of successful deep learning architectures include convolutional neural networks (CNN) for 15 image recognition (Krizhevsky et al., 2012b;Ranzato et al., 2006), Long short-term memory (LSTM) (Greff et al., 2015;Hochreiter and Schmidhuber, 1997) for time series modeling, variational auto-encoders (VAE) (Kingma and Welling, 2013), and deep belief networks for pattern recognition and data (typically image but also text or sound, etc) generation (section 3.2 in Shen17).CNNs and LSTMs have earned major recognition from the industry in research spearheaded by the information technology industry.Besides these new architectures, a novel generative model concept called generative adversarial networks 20 (GANs) has become an active area of research.The key characteristic of GANs is that they are learned by creating a competition between the actual generative model or ''generator'' and a discriminator in a zero-sum game framework (Goodfellow et al., 2014), in which these components are learned jointly.Compared to other generative models, GANs potentially offer much greater flexibility in the patterns to be generated.The power of GANs has been recognized recently in the geoscientific community, especially in machine learning research inspired by physics, where deep generative models have 25 been used for certain complicated physical, environmental, and socio-economic systems with deep generative models (Albert et al., 2018;Laloy et al., 2018).
The evidence is mounting that when given enough data, DL can provide the unique ability to automatically extract features, sometimes better than human experts do: cumulatively compiled, with convenient and uniform data access provided by the organizers.The 2010 was won by a large-scale SVM.CNNs first won this contest in 2012 (Krizhevsky et al., 2012a).Since then, and till 2017 (the last contest), the vast majority of entrants and all contest winners used CNNs, which edges out other methods by large margins (Schmidhuber, 2015).
 The IJCNN traffic sign recognition contest, which is composed of 50,000 images (48 pixels x 48 pixels), witnessed 5 superhuman visual recognition performance from CNN-based methods (Stallkamp et al., 2011).The superhuman performance was also scored by CNNs on recognition of cancers from medical images (Yu et al., 2016).
 The TIMIT speech corpus is a dataset that holds the recordings from 630 English speakers.LSTM-based models showed a large edge over Hidden Markov Model (HMM) results (Graves et al., 2013) in recognizing the speeches.
 An LSTM-based speech recognition system has achieved "human parity" in conversational speech recognition on the Switchboard corpus (Xiong et al., 2016).A parallel version achieved best-known pixel-wise brain image 15 segmentation results on the MRBrainS13 dataset (Stollenga et al., 2015).The improvement in language translation software can be witnessed by ordinary web users.
 A time series forecasting contests, Computational Intelligence in Forecasting Competition, was won by a combination of fuzzy and exponential models in 2015 when no LSTM was present, but LSTM won it in 2016 (CIF, 2016).
In sciences, DL models are quickly becoming the method of choice in analyzing data in high energy physics, chemistry, 20 biology, astrophysics, and remote sensing (section 4.3 in Shen17), let alone medical applications such as neurosciences.
In addition to utilizing big data, DL is able to create valuable, big datasets that could not have been otherwise possible.For example, utilizing DL, researchers were able to generate new datasets for Tropical Cyclones, Atmospheric Rivers and Weather Fronts (Liu et al., 2016;Matsuoka et al., 2017) by tracking them.DL was employed to achieve dynamical climate downscaling (Vandal et al., 2017), remote sensing of precipitation (Tao et al., 2017(Tao et al., , 2018)), estimate crop yield (You et al., 2017), prolong 25 satellite-sensed soil moisture (Fang et al., 2017) and crop diseases (Pryzant et al., 2017).All these datasets are for abstract variables which can now be reliably retrieved by DL.We agree that, just like other methods, DL may eventually be replaced by newer ones, but that is not a reason to hold out on possible progress.

Network interrogative methods to enable knowledge gain from deep networks
Conventionally, neural networks were primarily used to approximate mappings between inputs and outputs.The focus was 30 put on improving predictive accuracy.In terms of the use of neural networks in scientific research, then, there have been concerns: (1) DL and more generally machine learning (ML) are referred to as black boxes that cannot be understood by humans and thus, cannot serve to advance scientific understanding; and (2) Data-driven research lacks clearly-stated hypotheses.There has been significant pressure from inside and outside the deep learning community to make the network decisions more explainable.For example, European laws dictate that automated individual decision making which significantly influences the algorithm's users must provide a "right to explanation" where a user can ask for an explanation of an algorithmic decision (Goodman and Flaxman, 2016).5 Some recent progress in DL research focused on addressing these concerns.Notably, a new sub-discipline, known as "AI neuroscience" has produced useful interrogative techniques to help scientists interpret the knowledge by deep networks from data (see literature in Section 5.2 in Shen17).Such methods include (i) attributing deep network decisions to input features or a subset of inputs.For example, for image recognition tasks, the decision of the network can be traced to some regions on the image that led to the decision (Montavon et al., 2017); (ii) transferring knowledge from deep networks to interpretable, 10 reduced-order models.For example, a trained deep vision network can be used to train simpler models such as classification trees (Ribeiro et al., 2016); (iii) visualization of network activations, e.g., (Samek et al., 2017;Yosinski et al., 2015).For example, activations of recurrent neural networks can be visualized to show the control domain of certain cells, which explains its functioning (Karpathy et al., 2015); and (iv) problem-specific, ad-hoc analytic methods.For example, certain signals from the inputs can be added or removed to examine the impacts of such features (Alipanahi et al., 2015).Among these, (i)-(iii) are 15 mostly developed in the computer science domain, while (iv) requires the most effort and collaboration between domain scientists and computer scientists.

The complementary research avenue
As the interrogative methods further grow, there emerges a complementary avenue toward attaining knowledge, as shown in Figure 1.The data-driven research avenue can be divided into four steps: (i) hypotheses are generated by machine learning 20 algorithms from data; (ii) the validation step is where data withheld from training, and different from training, are employed to evaluate the machine-learning-generated hypotheses; (iii) interpretive methods are employed to extract data-consistent and human-understandable hypotheses (described in Section 2.3); and (iv) the retained hypotheses are presented to scientists for analysis and further data collection, and the process iterates.
The classical avenue faces non-uniqueness and subjectivity.To give a concrete example, consider a classical problem of 25 rainfall-runoff modeling.Suppose a hydrologist found that hydrologic responses in several nearby basins are different.Some basins produce flashier peaks while others have smaller peaks in summer, large seasonal fluctuation and large peak streamflows only in winter.Taking a modeling approach, the hydrologist might invoke a conceptual hydrologic model, e.g., Topmodel (Beven, 1997), however, the model results may not adequately describe the observed heterogeneity in the rainfall-runoff response.The hydrologist might hypothesize that the different behaviors are due to heterogeneity in soil texture which is not 30 well represented in the model.The hydrologist may add in processes that represent soil spatial heterogeneity, such as modified soil pedo-transfer functions that can differentiate between the soil types in different regions.Perhaps with some parameter adjustment, this model can provide streamflow predictions that are qualitatively similar to the observations.This procedure then increases the hydrologist's confidence that the heterogeneity in soil hydraulic parameters is responsible for their different hydrologic responses.However, this improvement is not conclusive due to process equifinality: there can be alternative processes that can also result in similar outcomes, e.g., the influence of soil thickness, terrain or drainage density.The identification of potential improvement might be dependent on the hydrologist's intuition or pre-conceptions, which are 5 nonetheless important but potentially biased.Furthermore, incorporating all the physics into the model may prove technically challenging or too time-consuming.
Compared to the classical avenue, the data-driven approach may help scientists more efficiently explore a larger set of hypotheses.Although it cannot be said that the machine learning algorithms present no human bias (because inputs are humandefined and some hyperparameters are empirically adjusted), the larger set of hypotheses presented will at least reduce that 10 risk greatly.First, let us examine a CART-based data-driven approach.We could start with physiographic data for many basins in this region, including terrain, soil type, soil thickness, etc.We can use CART to model the process-based model's errors, which allows us to separate out the conditions under which these errors occur more frequently.We let the pattern emerge out of data without enforcing a strong human pre-conceived hypothesis.Attention must be paid to the robustness of the data mining and utilize holdout dataset or cross-validation to verify the generality of the conclusion.Data may suggest that soil thickness 15 is the main reason for the error.Or, if data do not prefer one hypothesis over the other, then all hypotheses are equally possible and cannot be ruled out: summarized in a short phrase, "an algorithm has no ego."On a practical level, this approach can more efficiently and simultaneously examine multiple competing hypotheses.
One example of such analyses was carried out in Fang and Shen, (2017) where differences in basin storage-streamflow correlations were explained by physical factors using CART, an earlier-generation data mining method (Figure 2).The data 20 mining analysis allowed patterns to emerge, which inspired hypotheses about key factors that control the hydrologic functioning of different systems, such as soil thickness and soil bulk density are important controls of drought recovery, while biodiversity only showed secondary importance (Schwalm et al., 2017).Scientists need to define the predictors and general model types, but they do not pose strongly constraining hypotheses about the controlling factors, and instead "let the data speak".The key to this approach is a large amount of data from which pattern emerge.25 Working with DL models, we need to further resort to interrogative methods to make the results more interpretable (Figure 1 Right).For example, we can construct DL models to predict the errors of the process-based model, and then use visualization techniques to see which variable, under which condition, lead to the error.Because DL can absorb a large amount of data, it can find commonality among data as well as identify differences.Whereas CART models are limited by the amount of data and face stability problems in lower branches (data are exponentially less at lower branches), DL models may produce a more 30 robust interpretation.The machine learning paradigm lends us to finding "unrecognized linkages" (Wagener et al., 2010) or find complex patterns in the data that humans could not easily realize or capture.Owning to the strong capability of DL, it can better approximate the "best achievable model" (BAM) for the mapping relations between inputs and output.As such, it lends support to measuring the information content contained in the inputs about the output.Nearing et al., (2016) utilized Gaussian Process regression to approximate the BAM.DL can play similar roles and can also allow for modelling, perhaps in a more thorough way. 5 Outputs from the hidden layers of deep networks can now be visualized to gain insights about the transformations performed on the input data by the network (Samek et al., 2017).For image recognition tasks, one can invert the DL model to find out the parts of the inputs that led the network to make a certain decision (Mahendran and Vedaldi, 2015).There are also means to visualize outputs from recurrent networks, e.g., showing the conditions under which certain cells are activated (Karpathy et al., 2015).These visualizations can illustrate the relationships that the data-driven model has identified.10 Considering the above potential benefits, the data-driven avenue should at least be considered or given an opportunity to play a role in water sciences discovery.However, this avenue may be uncomfortable to some researchers.In the classical avenue, the scientist must originate the hypotheses before constructing models; in the data-driven avenue, one needs to set up the algorithm to model a certain target.Then, the data mining/knowledge discovery process is a precursor step to the main hypotheses formation--hypotheses cannot be generated before the data mining analysis.This feature is a natural consequence 15 of handing part of the work to an algorithm but may cause some disarray for those who follow what has been perceived as structured scientific methods.Especially, hypotheses can no longer be unequivocally stated during the proposal stage of research.
Granted, the interrogative methods as a whole are new and time is required for them to grow.We need to note that the nascent "DL neuroscience" literature did not exist until 2015.However, if we outright reject the complementary avenue based on our 20 habitual thinking that neural networks are black boxes, we may deny ourselves opportunities for breakthroughs.

Community-coordinated hydrologic modeling competitions to explore the "performance-explainability Pareto front"
We share the opinion that open, fast and standardized competitions are a very effective way of accelerating the progress in an 25 area.Hydrologists are charged by the society to deliver accurate predictions about floods, drought, groundwater levels and other variables.Improving the performance of methods are not only of scientific interest but also societal urgency.In addition to commonly employed metrics, we can formulate competitions that are evaluated based on the attainment of understanding.
As discussed earlier, a prime example of the use of competitions are the community-coordinated challenges in computer science, which paved the way for accelerated design improvement.New methods can be evaluated objectively and 30 Via these competitions, the community can learn advantages and disadvantages of some network designs.The advantages and disadvantages of different methods can be thoroughly explored and laterally compared.However, these results do not suggest 5 other statistical methods have no value.Rather, data limitations and various constraints often make simpler methods valuable.
However, an undeniable and increasingly apparent trend is that deep learning shows unrivaled predictive performance.
We envision multi-faceted hydrologic modeling competitions where various models ranging from process-based ones to deep learning ones are evaluated and compared.The coordinators will provide a set of standard atmospheric forcings, landscape characteristics, and observed variables.The participants should submit results driven by the standard inputs, but they may 10 optionally also use their own inputs.Target observations may be soil moisture, streamflow records and/or groundwater levels.
Importantly, the evaluation criteria include not only performance-type criteria such as model efficiency coefficients and bias but also qualitative/explanatory ones such as explanations for control variables and model errors.Over-simplified or poorlyconstructed models may provide more accessible explanations, but they might be misleading because the models may be overfitted to a given situation.Their simplicity may also constrain their ability to digest large datasets as a way of reducing 15 uncertainty.Multi-faceted competitions allow us to also identify a "Pareto front" of explainability and performance and help rule out "false explanations".The objective of the competition is not only to seek the best simulation performance, but also those methods that offer deeper insight into hydrologic dynamics.
Another important value of competitions is that organizers will provide a standard input dataset and well-defined tasks, which greatly save the community resources and effort, so that participants can focus on the modeling aspects.A substantial amount 20 of effort is required to establish such a dataset, which may only be possible under a specifically designed project.However, any effort in creating the dataset will return great value to the community.

Collecting big data through data sharing and citizen scientists
For problems that involve abstract information that cannot be directly sensed, a major obstacle is to collect enough supervising data to train DL models.Data collection can be greatly enhanced by centralized data compilation, a task many institutions are 25 already undertaking.For example, the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) hosts large amounts of hydrologic data.For another example, in 2015, a project called Collaborative Research Actions (Endo et al., 2015) was proposed in Belmont Forum, which is a group of the world's major and emerging funders of global environmental change research.Many scientists from different countries join the project and focus on the same issue, Food-Energy-Water Nexus.They shared their data (heterogeneous data) and research results from different regions.30 The database organizers could help data in a way that facilitates data mining and deep learning.However, it is unlikely that all data can be stored in one location, considering the volume of high-resolution remote sensing data.This coordination would require consulting with data scientists when designing the infrastructure.In addition, besides providing data, a concurrent role that databases can take is also to provide more channels to share experiences, scholarly discussions, and debates along with the generation of data.5 Another important area where deep learning is expected to deliver significant value is the analysis of big and sub-researchquality data such as those collected by citizen scientists.A valuable feature of water sciences is that they are accessible to ordinary people.Citizen scientists could help gather data about precipitation, temperature, humidity, soil moisture, river stage, and potentially groundwater levels.These quantities can be measured by inexpensive instruments like pressure gauges and moisture sensors.Volunteer scientists can also be requested for results in an active learning framework (Settles, 2012), i.e., 10 they can be queried for more data for instances that can best reduce the uncertainty of the predictions.Crowd-sourced data have played roles in deep learning research (Huang et al., 2016;Izadinia et al., 2015), even though there are problems related to data quality.An important co-benefits of involving citizen scientists is the education and outreach to the public.The active engagement is much more effective when the public has a stake in the research outcomes.

Water sciences provide unique challenges and opportunities for DL 15
Compared to classical DL problems such as image/speech recognition that DL techniques have been applied to, hydrology has unique set of challenges that are research opportunities for DL.Mostly, DL research has not cover these questions extensively, but they exist across disciplines.Water scientists and computer scientists can work together to address these questions, which may lead to advances in machine learning.
(1) Observations in hydrology and water science in general are often regionally imbalanced.For example, while streamflow 20 data are relatively dense in the United States, it is very sparse in many other parts of the world.Even for variables that can be remotely sensed, e.g., soil moisture, dense canopies often prevent uniform observations of the variable.A significant body of literature studying this problem can be loosely summarized under the topic of "prediction in ungauged basins" (PUB) (Hrachowitz et al., 2013).However, PUB problems pose significant challenge to data-driven methods.The machine-learning models need to know their uncertainties, make constrained extrapolations, and estimate uncertainties.25 (2) Compared to standard IT applications, such as speech recognition or image recognition, water data are accompanied by a large amount of "contextual variables" such as land use, climate, geology, and soil.They covary and exert complicated controls on hydrologic responses but we have limited knowledge of some of them, especially subsurface properties like geology.Thus, there are significant uncertainties with respect to input datasets, and factorial covariation, or co-evolution (Troch et al., 2013) that require us to be careful with machine-learning-generated hypotheses.Not only do we need to our machine learning 30 (3) Hydrologic observations also tend to be incomplete, but there are multiple sources of observations.For example, top 5-cm surface soil moisture only reflects a very small fraction of the water cycle, but we can also observe terrestrial water storage, which are related to soil moisture.Thus, how to merge inter-related information from different sources and to improve the 5 prediction of each other is an important question that DL have not studied extensively but can help address.
(4) Hydrologic problems fit poorly into the template of problems that the standard network structures (Section 3.2 in Shen17) are designed for.While some direct applications such as soil moisture hindcasting (Fang et al., 2017) and precipitation retrieval from images (Tao et al., 2016) are possible, we envision many new types of problems may require customized structures.For example, catchment hydrologic problems have both spatial but static (topography and groundwater flow) and temporal 10 (atmospheric forcing) dimensions.
(5) For hydrologic DL systems that can deliver useful predictions, customizing interrogative techniques to understand the network decision is of high priority.Related questions include what sources of information was used by the DL system to make the useful prediction, what patterns trigger a good prediction, what are the relationships between variables inside the network, etc.As this part is relatively new for hydrology, especially in the context of deep learning, much effort is needed to 15 develop related visualization and interrogative tools to be shared with the community.
(6) There can be synergy between process-based models (PBMs) and DL models.Machine learning models and PBMs complement each other rather than compete against each other.DL models have already been used as surrogate models for PBMs, but many novel ways that couple the two should be investigated.
(7) DL also provides new opportunities for multiple-point (geo)statistics (MPS).A few researchers have started to use deep 20 generative models for multiple-point geostatistical simulation (Mosser et al., 2017) along with inversion (Laloy et al., 2017(Laloy et al., , 2018) ) but progress in deep generative modeling are constantly made and the potential of this type of DL models for MPS applications (unconditional and conditional simulation, inversion, 2D to 3D pore space reconstruction, etc) is yet largely unexplored.

Concluding remarks 25
In this opinion paper, we argue that scientists ought to give thoughts to a complementary research avenue, where DL-power data mining is used to generate hypotheses, which are subsequently tested.In the past there may have been strong reservations toward black-box machine learning algorithms.Significant efforts have been put in the interpretation and understanding of deep learning networks, and hydrologists have the opportunity to push research forward in this regard.We have also argued for open hydrologic competitions that emphasize both performance and explainability.DL has powered breakthroughs in other 30 disciplines.We argue water sciences make use of the big data and citizen science potential, and exploit DL as a valuable tool toward scientific discovery in water-related fields.


The ImageNet Challenges is an open competition to evaluate algorithms for object detection and image classification 30 (Russakovsky et al., 2014).Topics change during each contest, and a dataset of ~14M tagged images and videos were Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-168Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 9 April 2018 c Author(s) 2018.CC BY 4.0 License.
Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-168Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 9 April 2018 c Author(s) 2018.CC BY 4.0 License.disseminated rapidly, with reduced subjectivity in the evaluation.In the case of deep learning, DL models have emerged as a dominant force in almost every contest where it was applicable since 2012.Despite substantial manual efforts spent on earlier methods such as SVM and Hidden Markov Model (HMM), deep neural networks repeatedly show advantages.
Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-168Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 9 April 2018 c Author(s) 2018.CC BY 4.0 License.algorithms to generalize well to unseen examples, but we also need to them to estimate the predictive uncertainty, which has not been a common practice in DL research.

Figure 1 .
Figure 1.Comparing two alternative avenues toward gaining knowledge from data.In the classical avenue, scientists interpret data, 5

Figure 2 .
Figure 2. (adapted from Fang and Shen 2017.Reprint permission obtained).We calculated storage-streamflow correlation patterns over continental United States (CONUS) and divided small or mesoscale basins into multiple classes.We studied what physical factors most cleanly separate different correlation patterns.In this case, what separates the blue class (storage and streamflow are 15