Interactive comment on “ Data-driven catchment classification : application to the PUB problem ”

Abstract. A promising approach to catchment classification makes use of unsupervised neural networks (Self Organising Maps, SOM's), which organise input data through non-linear techniques depending on the intrinsic similarity of the data themselves. Our study considers ∼300 Italian catchments scattered nationwide, for which several descriptors of the streamflow regime and geomorphoclimatic characteristics are available. We compare a reference classification, identified by using indices of the streamflow regime as input to SOM, with four alternative classifications, which were identified on the basis of catchment descriptors that can be derived for ungauged basins. One alternative classification adopts the available catchment descriptors as input to SOM, the remaining classifications are identified by applying SOM to sets of derived variables obtained by applying Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) to the available catchment descriptors. The comparison is performed relative to a PUB problem, that is for predicting several streamflow indices in ungauged basins. We perform an extensive cross-validation to quantify nationwide the accuracy of predictions of mean annual runoff, mean annual flood, and flood quantiles associated with given exceedance probabilities. Results of the study indicate that performing PCA and, in particular, CCA on the available set of catchment descriptors before applying SOM significantly improves the effectiveness of SOM classifications by reducing the uncertainty of hydrological predictions in ungauged sites.

tion. Using SOMs in combination with data reduction techniques is one strategy to achieve this.
My main criticism at this point is that there is no attempt made by the authors for a physical interpretation (explanation?) of the result. The authors should expand their discussion to explain what catchments are grouped, why this might be hydrologically appropriate to group them (with respect to ïňĆood behavior) etc. The authors present an interesting technique that they test in a reasonable way. Now they just have to expand it so that the paper becomes appropriate for a hydrology journal. This is something that I think the authors are very capable of doing.

AC:
We are a bit puzzled by Referee's main criticism. The main goal of the analysis is to assess whether (unsupervised and objective) multivariate techniques may improve the effectiveness of an unsupervised and objective approach (Self Organizing Maps, SOM's) to the problem of catchment classification within the PUB general context. SOM's count a number of hydrological applications presented in hydrological journals (see e.g. Hall and Minns, 1999;Jingyi and Hall, 2004;Srinivas et al., 2008;Toth, 2009). We show in the study qualitatively and quantitatively that SOM's benefit from the application of such multivariate techniques in predicting long-term streamflow indices (mean annual runoff) as well as flood flows.
If our main goal was to derive a physically-based catchment classification, assumed that this task can be carried out at all at national level in a country with such a huge hydrological variability, we would have probably adopted different approaches and techniques to begin with.
Our study falls instead within the vast literature applying multivariate statistical analyses, and in particular clustering algorithms based on geomorphologic and climatological catchment characteristics, to obtain an "objective" identification of groups (clusters) of watersheds with similar attributes. It is the information content of such attributes C837 HESSD 8, C836-C844, 2011 Interactive Comment Full Screen / Esc
We clearly point out in the discussion section and, in particular, in the Conclusions that "the application of objective but merely statistical criteria and algorithms (PCA and CCA with SOM) revealed some limitations that may be significantly reduced by switching from data-driven to data-and process-driven catchment classification".
Nevertheless, we believe that Referee's point may arise from a lack of clarity of our manuscript on the main goal of the study. We will revise the manuscript accordingly, by including a direct reference to the main goal outlined above in the introduction and also in the abstract, since we are asked to heavily restructure it (see below).

RC:
Specific Comments -Please use more acronyms in your abstract!!! Just kidding. Please take out the acronyms there. It is not necessary and makes reading the abstract very cumbersome.
-It would be good to have less detail on the method in the abstract, but actually read about real results there so that the reader knows what he/she will get from reading the paper.
-The authors should avoid very short paragraphs. Certainly one sentence paragraphs are not good style.
-It would be good to deïňĄne in the beginning of the paper (or in the abstract), that the authors are not talking about predictions of continuous streamïňĆow, but rather about different ïňĆow indices. We will modify the abstract as recommended. Also, we will carefully revise abstract and body of the manuscript trying to make short sentences as long, verbose and unclear as possible!!! Just kidding.
We will avoid very short sentences in the revised manuscript.

RC:
-Is the ranking of controlling variables (i.e. controlling the classiïňĄcation) similar?
-What assumptions are made regarding how the physical/climatic characteristics control the hydrologic behavior of catchments?

AC:
The Referee probably refers to the weight or importance of different descriptors for the identification of the different classes. For the same reasons reported in our reply to the main point raised by Referee#1, we preferred to refer in the study to the similarity and affinity of the different classes in terms of membership of different catchments (section 5.3) and to the performance of each classification for the prediction of streamflow indices (section 5.4).
To form the dataset of physiographic and climatic descriptors to be used in the study we did not make any a-priori assumption, since we were already constrained by the intrinsic difficulty of compiling a National dataset, which includes homogeneous and consistent information for ∼300 catchments. Therefore we included as many relevant catchment descriptors as possible, using multivariate analysis techniques to sort out noise and redundancy (Principal Component Analysis, see e.g. Chokmani and Ouarda, 2004;Castiglioni et al., 2011) while retaining the information that is more descriptive of the streamflow regime (Canonical Correlation Analysis, see e.g. Krzanowski, 1988;Ouarda et al., 2001).
Both these considerations were not explicitly reported in the text. We will include a reference to the first one at the beginning of section 5, which could also be useful C839

Interactive Comment
Full Screen / Esc

Printer-friendly Version
Interactive Discussion Discussion Paper for introducing the structure of section 5 itself. We will also revise the manuscript by reporting a reference to our second comment at the beginning of subsection 4.1.

RC:
-I would separate section 2 into one section reviewing catchment classiïňĄcation and one discussing the issue of SOMs for classiïňĄcation in general. AC: Point taken. Section 2 will be revised according to Referee's comment, even though the main focus of the second subsection will necessarily be the hydrological application of SOM's rather than their application in general.

RC:
-It would be good to discuss (at the end) how this information (regionalized streamïňĆow indices) could be used further. For example, several authors (starting with Bardossy, 2007, JoH;and Yadav et al., 2007, Advances in Water Resources) suggest that these indices provide valuable information that can be assimilated into watershed models to reduce uncertainty in (continuous streamïňĆow) predictions in ungauged basins.

AC:
The revised manuscript will discuss the possible usefulness of catchment classification for the problem of rainfall-runoff model regionalization, including relevant references (we are aware of Bárdossy's study on regionalization of rainfall runoff models reported in HESS in 2007, but not of any published in JoH in 2007). The following paragraph will be included in the Introduction, around line 22 of page 393: "In particular, catchment classification may support regionalization of rainfall-runoff parameters (Hundecha et al., 2008), a topical issue in hydrology (see e.g., Bardossy, 2007;Yadav et al., 2007;Castiglioni et al., 2010) which is also is particularly relevant AC: Please refer to our reply to previous point on a National dataset of catchment descriptors and streamflow indices. We acknowledge that our dataset lacks on information concerning base-flow and subsurface characteristics in general, but also on land-cover and vegetation. Unfortunately we could not find consistent and homogeneous information on these characteristics nationwide. We will point out this limit of our study in section 4. of the revised manuscript (Study Area and available information).

RC:
-The mapping onto nodes within the SOM means that there is a frequency distribution 'in' each node. Could the uncertainty in this mapping be used to derive estimates of uncertainty in the predicted streamïňĆow indices? AC: This is a very good point. Even though this issue is clearly out of the scope of our study, we will cite this interesting research topic in the last paragraph of the discussion section (future analyses).

RC:
-It would be interesting if the authors would list the subjective choices that necessarily have to be made in this type of analysis, but that could inïňĆuence the outcome (e.g. the Euclidean distance measure). This might help to guide future studies. C841