the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A hydrologist's guide to open science
Sheila M. Saia
Andrea L. Popp
Nilay Dogulu
Stanislaus J. Schymanski
Niels Drost
Tim van Emmerik
Download
- Final revised paper (published on 09 Feb 2022)
- Preprint (discussion started on 02 Aug 2021)
Interactive discussion
Status: closed
-
AC1: 'Principle 2 Section Inclusion for hess-2021-392', Caitlyn Hall, 09 Aug 2021
Upon pre-print publication, the following section was omitted. Attached is the preprint with it included in line with the rest of the text.
Principle 2 – Open Data: Open hydrologists document all components of their data collection and analysis pipeline, favoring open and non-proprietary technologies.
Hydrologists often combine data from a wide variety of field, laboratory, and computer sources, such as streamflow gauges, water samples, remote sensing datasets, digital elevation models, land use maps, and meteorological data. Data quality can only be assessed, and potential results replicated when the hardware design and specifications of measurement tools and data loggers are available to the public. We encourage use of open (i.e. non-proprietary) data formats, hardware specifications and software in data collection and processing workflows, and their systematic documentation with the aim to enable their re-use by the interested reader.
Data from the laboratory is often exported in formats specific to the laboratory device and typically requires some data reformatting necessary for post-processing. The format of computer-generated data (e.g., hydrology model outputs) varies with the computer software that generated it. An open data collection and analysis pipeline includes information on (1) hardware and software used, (2) original and processed (meta-)data and databases, (3) data processing and analysis techniques and tools used, and (4) documentation of the overall analysis process, including assumptions and perceptual models (see Principle 1). Re-usability and transferability of software and data processing pipelines greatly accelerates scientific progress in hydrology by reducing time wasted on re-inventing the wheel, helping discover problems in the analysis, and improving the quality of hydrologic research.
Practical Guide to Open Data Collection and Analysis
Open hydrologists share and cite the source and collection method of all qualitative and quantitative data involved in their research, including field, laboratory, computer, and/or third-party (online) data used. A current list of data repositories commonly used by hydrologists that adhere to open science standards is kept on open-hydrology.github.io. The best place to store data for an open hydrology project depends on the type and size of the data, the specific scientific domain, and other requirements stipulated by the funders and stakeholders. If an open hydrology study relies on third-party data that is not (yet) open, ask the original data creators to make the data or a data subset publicly available. Archived original, intermediate, and final versions of all data used to obtain the results of a particular study are crucial for reproducing open hydrology research. See Principle 4 for more details on publishing data.
To make data and analysis sharing more straightforward, a data management plan should be developed in the early stages of the research project, emphasizing open data principles and maintaining cyberinfrastructure and community standards. Data management plans describe where data will come from, what formats it will be stored in, who will manage and maintain it, how privacy will be maintained (if applicable), and how data and results will be shared and stored in the short- and long-term. Data management plans may be required by funders where they are typically limited in length. However, extended data management plans can increase research project transparency, and can be created using publicly available templates (e.g., ckan, DMPTool, resources.data.gov) that adhere to funder requirements and formatting. Some tools (e.g., ckan) can help hydrologists make previously unpublished data publicly available, even after publication.
Open hydrologists should explicitly provide public access (e.g., through a link accessible on the journal publication site) to: (1) raw data and associated metadata (including specifications of the devices used to collect data), (2) descriptions and citations for the analysis methods and software versions used, (3) workflows, code, and software developed to collect and analyze data, (4) descriptions of quality controls used when processing raw data, (5) final processed data, and (6) descriptive methods used to integrate data into other processing tools. The level of detail necessary to ensure openness can differ wildly between studies. When data sources, processing, and accessibility are complex, additional descriptions in an appendix or supplementary information may be appropriate upon publication of hydrologic research.
Ideally, all data used to draw conclusions should be published publicly to facilitate reproducibility, but copyright on third-party data, privacy, or other issues related to data sensitivity may prohibit open publication of all underlying data. Discuss, agree, and document with your collaborators what can be shared publicly as early as possible. If certain datasets cannot be shared publicly, add a statement to the final publication explaining what conditions need to be fulfilled to obtain access to the data and why some data remain private. Relevant resources and local guidelines for data anonymization and sharing (e.g., General Data Protection Regulation) need to be considered before developing a data management plan and conducting research (Zipper et al., 2019). When making data publicly available, open hydrologists strive to store data in universal, non-proprietary, and software agnostic formats that are compatible with most operating systems and include metadata (data about the data that provides background context). For example, text and tabulated data can be stored as standard American Standard Code for Information Exchange (ASCII) text (American Standard Code for Information Interchange) instead of proprietary or software-specific types (e.g., Microsoft Word .docx or Excel .xlxs files) that require a paid software license to use. Even if it might be computationally efficient, avoid creating new file types that are specific to a certain model or software. For most hydrologic data, NetCDF (i.e., .nc) files are currently the gold standard for storing data and metadata. If metadata cannot be part of the data (file) itself, store the metadata in as close proximity to the data as possible. For example, open hydrologists can include links in the metadata to where the data is stored and vice versa. They can also use standard naming and unit conventions (e.g., SI units), metadata formats (e.g., Water Metadata Language), and be informative and sufficiently complete to allow for a better understanding of the data and reproduction of study results.
Citation: https://doi.org/10.5194/hess-2021-392-AC1 -
AC2: 'Principle 2 In-line inclusion', Caitlyn Hall, 09 Aug 2021
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2021-392/hess-2021-392-AC2-supplement.pdf
-
AC2: 'Principle 2 In-line inclusion', Caitlyn Hall, 09 Aug 2021
-
RC1: 'Comment on hess-2021-392', Francesca Pianosi, 02 Sep 2021
The article gives an interesting contribution to the discussion on Open Science in hydrology. I especially appreciate the practical focus on helping researchers to get started with OS, and the linking to an online repository where new materials and resources will be shared beyond the article publication.
The manuscript needs to be revised to include the section on Principle 2, which was made available as a separate file. I do not have other substantial revisions to recommend but some points for improvement and further discussion.
In particular, given the ''practical guide" angle of this article, one suggestion could be to complement the text with Tables listing in a very concise manner the various recommendations / tips / dos & donts for each Principle. For example, for Principle 2 (open software) the list may start like:
- use open-source software such as R, python or QGIS to develop your analysis
- use open-source version control system (e.g. Git) to manage changes to your code
- include documentation as comments embedded in the code as much as possible
- etc.
I think this would help reinforce key messages and help readers navigate the (numerous) points made in the textWhile reading the paper I noted down several other comments. I am not sure they are all worth including/addressing in the paper but I'll report them and leave to the authors to decide if they want to take them onboard.
[1] L. 19 - Social challenges to embracing OS. The authors essentially mention one, the fear of being scoped, but I think others are as important. For example some researchers may be reluctant to share their software as this may bring further scrutiny and criticism of their work. Some seem to feel a sort of "jealousy" for their software, which they don't want to see modified (maybe improved!) by others. Maybe the point here is how we perceive and value intellectual ownership. If I make my software available to others so they will (unavoidably) find bugs to fix and weaknesses to improve, does this diminish or increase the value of my original contribution?
[2] Line 94: "sharing the entire research process and approach (e.g., failed attempts and lessons learned that impacted research outcomes)".
I totally agree although there is a tension here between conciseness (which is needed for readability) and completeness (needed for OS). I think a good way to resolve the conflict is by having unlimited "Supplementary materials" along with a paper - as some journals now allow - so that authors can keep the main article focused on key findings, while giving detailed documentation of all the research process in the SMs.[3] Line 98: I think the point about "minimising the use of jargon" is very important. We use a lot of academic writing cliches in our articles, perhaps thinking it makes them sound more technically solid, but often it only makes them more difficult to read! Another issue is the recourse to hyper-specialised terms that are only understood within our small research niche - and often take different meanings across sub-communities even within the same broad discipline (a good example: the diverse uses of the term "bottom-up approach" across sub-communities in hydrology and water resource management). Every now and then initiatives are launched to build glossaries that should help researchers navigate each other jargon, but my impression is they are quickly abandoned (for example years ago I was involved in a project on uncertainty and risk in natural hazard assessment and such glossary was one of the project outputs... I don't think it was ever delivered!). Maybe rather than building glossaries we should just do more to use a common and simpler language. This should include avoiding the creation of new terms for concepts that may be easily described with existing maths terminology. I write and review a lot of modelling/methodological papers, and I have the impression that new terms (and acronyms!) are often created under the pressure to "demonstrate novelty" - authors may be afraid that if their proposed methodology does not have a new name but is described using standard terms from a statistic textbook, reviewers will dispute its novelty. This way though we make our papers unnecessarily obscure and in the long-run we collectively contribute to fragmentation of knowledge and duplication of efforts. This links back to the general tension I find between OS and the way we reward (over-emphasise?) "novelty" of individual contributions (see also comment [1] above).
[4] L. 193 About green OA. Maybe this sounds naive but I really wonder what are the drawbacks of "green OA" (and how it may be sustainable in the long-term for publishing companies)? Authors do not pay publication fees, readers do not pay subscription fees (after the embargo period) as they can access the non-typeset version... sounds very convenient - but for the publishing companies! Am I missing something?
[5] L. 226: "For software, we suggest the authors start by declaring a permissive license because it improves transparency and reduces downstream licensing conflicts."
I personally agree with this suggestion but I think many (including some at the R&D team of the University I work at!) would find it controversial. A permissive license (say for example the MIT license) implies that the software developed by publicly-funded researchers will be freely available also to users that may make a commercial use of it - and hence may be willing to pay for a licence. So, should universities "give away" a potential source of revenues? Is this fair to tax-payers who funded the research and related software development?Some more general thoughts:
[6] A lot of the activities for OS mentioned in this paper take time - for self-learning and training and to make data/software accessible to others. For example, documentation is key for open software to be meaningfully used by others but developing good documentation is very time-consuming (this was a key lesson I learnt in my own projects - see Pianosi et al 2020). So I think there is a tension here between OS and the general "publish or perish" attitude. I wonder if Open Science is necessarily also Slow Science (e.g. Frith, 2019)?
[7] Open teaching. I am all in favour for it - like most of us last year I developed lots of materials for on-line learning and I am keen to make it open access (as soon as I find out what's my University policy on this!). This said, I wonder what the long-term implications of open teaching will be for university life. If excellent study materials become available online to all and for all subjects, then what is the reason for enrolling in a university programme instead of self-learning? Will students only attend university to clarify doubts and get assessed - or, in other words, will the main role of universities become accreditation rather than delivering contents? I am not saying this is necessarily a bad thing (maybe it'll give academics more time for research or other type of engagement with students, such as mentoring or research-based teaching) - just highlighting it would be a very substantial change to the way higher education works today.
References
Pianosi et al (2020) How successfully is open-source research software adopted? Results and implications of surveying the users of a sensitivity analysis toolbox, EMS.
Available at:https://research-information.bris.ac.uk/ws/portalfiles/portal/215604556/Paper_SAFE_Survey_accepted.pdfFrith (2019) Fast Lane to Slow Science, Trends in Cognitive Sciences.
Available at: https://discovery.ucl.ac.uk/id/eprint/10091940/1/Frith_Fast%20Lane%20to%20Slow%20Science%20Prefinal.pdfCitation: https://doi.org/10.5194/hess-2021-392-RC1 -
AC3: 'Reply on RC1', Caitlyn Hall, 28 Oct 2021
- RC1: 'Comment on hess-2021-392', Francesca Pianosi, 02 Sep 2021 reply
The article gives an interesting contribution to the discussion on Open Science in hydrology. I especially appreciate the practical focus on helping researchers to get started with OS, and the linking to an online repository where new materials and resources will be shared beyond the article publication.
RC1.1: The manuscript needs to be revised to include the section on Principle 2, which was made available as a separate file. I do not have other substantial revisions to recommend but some points for improvement and further discussion.
Response: We are glad to hear that the reviewer found this manuscript interesting and that they found the online repository helpful. We apologize for not including Principle 2 in the original document. That was a misstep on our part and this section will be included in the revised manuscript posted on the HESS discussion board.
RC1.2: In particular, given the ''practical guide" angle of this article, one suggestion could be to complement the text with Tables listing in a very concise manner the various recommendations / tips / dos & donts for each Principle. For example, for Principle 2 (op en software) the list may start like:
- use open-source software such as R, python or QGIS to develop your analysis
- use open-source version control system (e.g. Git) to manage changes to your code
- include documentation as comments embedded in the code as much as possible
- etc.
I think this would help reinforce key messages and help readers navigate the (numerous) points made in the text
Response: We thank the reviewer for making this great suggestion. We agree that a concise table of tips, tools and resources for each principle would be a great way to summarize main takeaways for readers. We will include an additional table in the Practical Guide section in the revised manuscript and will add the table to the website as well.
While reading the paper I noted down several other comments. I am not sure they are all worth including/addressing in the paper but I'll report them and leave to the authors to decide if they want to take them onboard.
RC1.3: [1] L. 19 - Social challenges to embracing OS. The authors essentially mention one, the fear of being scoped, but I think others are as important. For example some researchers may be reluctant to share their software as this may bring further scrutiny and criticism of their work. Some seem to feel a sort of "jealousy" for their software, which they don't want to see modified (maybe improved!) by others. Maybe the point here is how we perceive and value intellectual ownership. If I make my software available to others so they will (unavoidably) find bugs to fix and weaknesses to improve, does this diminish or increase the value of my original contribution?
Response: Thank you, this is an important addition. We will modify the sentence to:
“...and social (e.g., fear of weaknesses being exposed or ideas being scooped) challenges remain.”
We very much agree that intellectual ownership of code is a big social challenge when it comes to OS. We will include a discussion of this aspect in the revised manuscript. While we cannot solve the problem within the scope of this manuscript, our hope is that it will kick-off a wider conversation on how we, as a research community, give credit for and value open software along with its benefits for advancing hydrological research. We look to work being done by the Research Software Alliance, as they aim to do this for research broadly.
RC1.4: [2] Line 94: "sharing the entire research process and approach (e.g., failed attempts and lessons learned that impacted research outcomes)".
I totally agree although there is a tension here between conciseness (which is needed for readability) and completeness (needed for OS). I think a good way to resolve the conflict is by having unlimited "Supplementary materials" along with a paper - as some journals now allow - so that authors can keep the main article focused on key findings, while giving detailed documentation of all the research process in the SMs.
Response: We agree. We will include this as a suggestion explicitly in this line, such that it reads:
"sharing the entire research process and approach (e.g., failed attempts and lessons learned that impacted research outcomes) as appropriate in the main journal article and in more detail in the supplementary materials section of a publication. An additional option for authors is to share the entire research process associated with a publication through the Open Science Foundation’s platform.”RC1.5: [3] Line 98: I think the point about "minimising the use of jargon" is very important. We use a lot of academic writing cliches in our articles, perhaps thinking it makes them sound more technically solid, but often it only makes them more difficult to read! Another issue is the recourse to hyper-specialised terms that are only understood within our small research niche - and often take different meanings across sub-communities even within the same broad discipline (a good example: the diverse uses of the term "bottom-up approach" across sub-communities in hydrology and water resource management). Every now and then initiatives are launched to build glossaries that should help researchers navigate each other jargon, but my impression is they are quickly abandoned (for example years ago I was involved in a project on uncertainty and risk in natural hazard assessment and such glossary was one of the project outputs... I don't think it was ever delivered!). Maybe rather than building glossaries we should just do more to use a common and simpler language. This should include avoiding the creation of new terms for concepts that may be easily described with existing maths terminology. I write and review a lot of modelling/methodological papers, and I have the impression that new terms (and acronyms!) are often created under the pressure to "demonstrate novelty" - authors may be afraid that if their proposed methodology does not have a new name but is described using standard terms from a statistic textbook, reviewers will dispute its novelty. This way though we make our papers unnecessarily obscure and in the long-run we collectively contribute to fragmentation of knowledge and duplication of efforts. This links back to the general tension I find between OS and the way we reward (over-emphasise?) "novelty" of individual contributions (see also comment [1] above).
Response: Thank you to the reviewer for their insightful thoughts on this. We agree that jargon is used as a way to demonstrate novelty, and glossaries are not a solution (rather only reduce the symptoms). We will include the following a brief discussion of these two items in the text:
Jargon can be used as a way to demonstrate novelty or describe niche details, and glossaries are not a sustainable solution in the long term for supporting interdisciplinary open science progress. However, concepts can be expressed using simple fundamental terms familiar to scientists across disciplines.
RC1.6: [4] L. 193 About green OA. Maybe this sounds naive but I really wonder what are the drawbacks of "green OA" (and how it may be sustainable in the long-term for publishing companies)? Authors do not pay publication fees, readers do not pay subscription fees (after the embargo period) as they can access the non-typeset version... sounds very convenient - but for the publishing companies! Am I missing something?
Response: Green OA means that the review version of a manuscript (postprint) is shared with the public, while access to the final typeset version remains restricted to subscribers. Since the involved publishers only gain benefit from subscriptions, they have a strong incentive to make the final version more useful to readers than the postprint version. This is often achieved by formatting templates at the postprint stage that make a paper barely readable (e.g. figures separated from figure captions somewhere at the end of the document). So effectively, green-OA is not meant to work for publishers, it is a fall-back solution to tick the open-access box, while still maintaining a strong incentive to pay for the final typeset article for a more convenient reading experience. There are research communities (e.g., AI research) that mostly publish in a Diamond open access format, i.e., no fees for anyone, but obviously these do not follow a subscription-based business model, as green-OA does. We will clarify this in the main article.
RC1.7: [5] L. 226: "For software, we suggest the authors start by declaring a permissive license because it improves transparency and reduces downstream licensing conflicts."
I personally agree with this suggestion but I think many (including some at the R&D team of the University I work at!) would find it controversial. A permissive license (say for example the MIT license) implies that the software developed by publicly-funded researchers will be freely available also to users that may make a commercial use of it - and hence may be willing to pay for a licence. So, should universities "give away" a potential source of revenues? Is this fair to tax-payers who funded the research and related software development?
Response: We agree with the reviewer that a discussion on ownership of software created with public funds was lacking from our paper. We will add more discussion on this. In short, we think that software created by universities should be available for the entire public, including companies. Leaving distribution of this software in the hands of university valorization departments severely limits adoption to the lucky few that can afford the licensing fees and time needed to negotiate access, which are usually large corporations.
Some more general thoughts:
RC1.8: [6] A lot of the activities for OS mentioned in this paper take time - for self-learning and training and to make data/software accessible to others. For example, documentation is key for open software to be meaningfully used by others but developing good documentation is very time-consuming (this was a key lesson I learnt in my own projects - see Pianosi et al 2020). So I think there is a tension here between OS and the general "publish or perish" attitude. I wonder if Open Science is necessarily also Slow Science (e.g. Frith, 2019)?
Response: We thank the reviewer for bringing up this important point about the time it takes to learn new skills, attend training, and make data/software open and accessible. We agree there is a tension between the social pressure to increase the pace of research while ensuring research is transparent and well documented. This again highlights the tension between quality and quantity, inherent in any productive environment. OS effectively shifts the balance from quantity to quality, and it necessitates new approaches of research assessment, as e.g. formulated in the SF DORA declaration (https://sfdora.org/). That said, we will include discussion of this balance between social pressures and open science and will include the paper on slow science that was suggested by the reviewer. We will add the importance to set up future generations of hydrologists by suggesting incorporating OS courses into curriculum, such that they are set up for success to work in an OS manner. We appreciate them pointing this paper out.
RC1.9: [7] Open teaching. I am all in favour for it - like most of us last year I developed lots of materials for on-line learning and I am keen to make it open access (as soon as I find out what's my University policy on this!). This said, I wonder what the long-term implications of open teaching will be for university life. If excellent study materials become available online to all and for all subjects, then what is the reason for enrolling in a university programme instead of self-learning? Will students only attend university to clarify doubts and get assessed - or, in other words, will the main role of universities become accreditation rather than delivering contents? I am not saying this is necessarily a bad thing (maybe it'll give academics more time for research or other type of engagement with students, such as mentoring or research-based teaching) - just highlighting it would be a very substantial change to the way higher education works today.
Response: We thank the reviewer for bringing up open education as an important part of open science. The same arguments used to argue for open science can (and should in our opinion) be used for open education. We recognize that making teaching material openly available will change the way academic knowledge is translated to students and will result in growing pains for those. As teachers ourselves we know, both from literature on education as well as from our experiences, that the roles of the teacher and peer interaction is essential in the learning process and cannot be ‘completely replaced’ by online available open education material. For example, networking, lab experiments, interpersonal development, etc. cannot be replicated solely by open education material. Rather, the availability of open education material offers the teacher the possibility to focus their attention on developing learning activities that reinforce concepts provided by open education material and addressing different learning styles to have the most impact on the students.
While the above answer to the reviewer shows our thoughts on Open Education, we are hesitant to add too much of an emphasis on it in the manuscript under review. The focus of the manuscript is on Open Science for hydrological researchers. We do think a separate publication (or maybe even series of publications in a special issue) on Open Education in Hydrology is timely and of great value to the hydrological teaching community. We will add the following sentence to section 1. Motivation for Open Hydrology:
We will add the following to the manuscript:
“In education, open science makes research outcomes and processes available to teachers of hydrology courses for inclusion in their teaching. A movement parallel to open science, but not the focus of this manuscript, is open education, which argues for and provides tools to share education materials and best practices freely and openly.”
RC1.10: References
Pianosi et al (2020) How successfully is open-source research software adopted? Results and implications of surveying the users of a sensitivity analysis toolbox, EMS.
Available at:https://research-information.bris.ac.uk/ws/portalfiles/portal/215604556/Paper_SAFE_Survey_accepted.pdf
Frith (2019) Fast Lane to Slow Science, Trends in Cognitive Sciences.
Available at: https://discovery.ucl.ac.uk/id/eprint/10091940/1/Frith_Fast%20Lane%20to%20Slow%20Science%20Prefinal.pdf
Citation: https://doi.org/10.5194/hess-2021-392-RC1
Response: We will add these references to the manuscript and refer to them appropriately throughout the manuscript.
Citation: https://doi.org/10.5194/hess-2021-392-AC3 -
AC4: 'Reply on RC1', Caitlyn Hall, 28 Oct 2021
We thank the reviewer for their thoughtful and constructive comments. We are looking forward to implementing them in our final document!
Citation: https://doi.org/10.5194/hess-2021-392-AC4
-
AC3: 'Reply on RC1', Caitlyn Hall, 28 Oct 2021
-
CC1: 'Comment on hess-2021-392', Lina Stein, 21 Sep 2021
The article gives a summary of the current open science movement and advice how to advance the open hydrology movement specifically. The authors present a list of guiding principles and useful resources how open science can and should be pursued. I want to thank the authors for this well written contribution to open science. I only have some minor comments that I hope the authors will take into consideration.
Section: Motivation for Open Hydrology: A not so noble, but potentially convincing reason to adhere to open science standards would be that accessible articles/data/code see more citations. You briefly mention this for the 4th Principle. It might be worth mentioning this connection already in the Introduction.
L47-51: The explanatory sentence “specifically referred to as open hydrology” is a bit confusing, especially with the several citations coming after, it is difficult to connect "research projects" as a continuation of the list started with "open science".
L116: I think it would be better to separate researchers and other stakeholders as interest groups. Co-development with other research is much more common. Ideas are discussed and shared at conferences. Carrying the same collaborative effort outside the research community is more of a problem.
L119-124: This transition is a bit sudden. Can you elaborate what FAIR is and what it has to do with stakeholders? I would even recommend mentioning the FAIR standards already further up in the paper. Maybe you can elaborate what FAIR has to do with data management plans (L113).
Additionally, please spell out the acronyms FAIR and CARE at least once.
L140: Maybe "trustworthy" is a better word than “reliable”? They might still be reliable outputs when not open, but would not be trusted by others.
L156 and Principle 3: An explanatory half-sentence what “Carpentries” is, would be helpful. Alternatively, I would follow the advice given by Reviewer 1, Francesca Pianosi, and include overview tables. These can include links to the individual resources, which makes it easier for other researchers to access them. A similar, up-to-date table would be useful for the open hydrology project website as well. While a list of articles relating to open hydrology is a useful resource, a table with direct links would be more easily accessible.
L459: Can you briefly mention Table 2 here, since in the order the document is now, it appears before the scenarios.
L461: There is no Table 3. Please check your Table references in general and in the scenarios specifically. There probably has been a mishap in numbering.
L508: Any advice on how to address the fear of being scooped? Since you mention this worry already in the abstract it would be good to address this in the main article as well.
Principle 2: Is "Water Metadata Language" a fixed term? If it is, I do not know it and further explanation and reference would be helpful.
Citation: https://doi.org/10.5194/hess-2021-392-CC1 -
AC7: 'Reply on CC1', Caitlyn Hall, 28 Oct 2021
We thank the community reviewer for their thoughtful comments on this piece. We are grateful for their contribution to the open science in hydrology and feel that their comments sparked great dicussion and will greatly benefit the overall manuscript.
C1.1: The article gives a summary of the current open science movement and advice how to advance the open hydrology movement specifically. The authors present a list of guiding principles and useful resources how open science can and should be pursued. I want to thank the authors for this well written contribution to open science. I only have some minor comments that I hope the authors will take into consideration.
Response: We thank the reviewer for their comments and thoughts on our article. Community feedback is highly appreciated and below we respond to your comments.
C1.2: Section: Motivation for Open Hydrology: A not so noble, but potentially convincing reason to adhere to open science standards would be that accessible articles/data/code see more citations. You briefly mention this for the 4th Principle. It might be worth mentioning this connection already in the Introduction.
Response: We fully agree that mentioning increased citation of Open Science papers in the introduction helps the manuscript and have thus added “An additional benefit lies in increased citation numbers for articles embracing open science, as they contain useful assets on top of the scientific insights offered in the main text (Piwowar et al., 2007)” on line 41.
C1.3: L47-51: The explanatory sentence “specifically referred to as open hydrology” is a bit confusing, especially with the several citations coming after, it is difficult to connect "research projects" as a continuation of the list started with "open science".
Response: We agree that the sentence length may lead to confusion. We will re-phrase for clarity and to reduce clutter within this sentence.
C1.4: L116: I think it would be better to separate researchers and other stakeholders as interest groups. Co-development with other research is much more common. Ideas are discussed and shared at conferences. Carrying the same collaborative effort outside the research community is more of a problem.
Response: This is a good point. We will modify the sentence to:
“Stakeholders usually include fellow researchers, but they may also include industry professionals, non-profit organizations, government officials, communities, members of the public, and other parties that have an interest in hydrologic research.”
C1.5: L119-124: This transition is a bit sudden. Can you elaborate what FAIR is and what it has to do with stakeholders? I would even recommend mentioning the FAIR standards already further up in the paper. Maybe you can elaborate what FAIR has to do with data management plans (L113).
Additionally, please spell out the acronyms FAIR and CARE at least once.
Response: We feel that introducing FAIR in L113 would disturb the flow, but we will edit this section to emphasize and articulate more clearly that FAIR aims mainly at fellow researchers as stakeholders, while CARE encompasses a greater variety of stakeholders in this section. We will modify L119- to:
“Consequently, we suggest incorporating Findable, Accessible, Interoperable and Reusable (FAIR - Wilkinson et al., 2016; Garcia et al., 2020) and, where applicable, Collective Benefit, Authority to Control, Responsibility, and Ethics (CARE - Carroll et al., 2020; Walter et al., 2020) data standards into open hydrology research. While FAIR data standards were developed to improves access to and machine readability of data mainly to advance further research, thus aimed mainly at fellow researchers as stakeholders, CARE data standards encompass a greater variety of stakeholders as they were developed by Indigenous scholars to consider the interests of indigenous people whenever they are connected with a given dataset (The Global Indigenous Data Alliance, 2019) ”
C1.6: L140: Maybe "trustworthy" is a better word than “reliable”? They might still be reliable outputs when not open, but would not be trusted by others.
Response: We agree and will change it appropriately.
C1.7: L156 and Principle 3: An explanatory half-sentence what “Carpentries” is, would be helpful. Alternatively, I would follow the advice given by Reviewer 1, Francesca Pianosi, and include overview tables. These can include links to the individual resources, which makes it easier for other researchers to access them. A similar, up-to-date table would be useful for the open hydrology project website as well. While a list of articles relating to open hydrology is a useful resource, a table with direct links would be more easily accessible.
Response: We agree that a short description of “Carpentries” is indeed necessary. We will include the following in parenthesis: “.....”.
In line with Reviewer 1’s suggestion we now present a comprehensive table summarizing tips, tools and resources on four open hydrology principles that the manuscript introduces. We sincerely hope that this table will help hydrologists to adopt open hydrology principles at a practical level.
C1.8: L459: Can you briefly mention Table 2 here, since in the order the document is now, it appears before the scenarios.
Response: We will add the mention to this table in Line 459.
C1.9: L461: There is no Table 3. Please check your Table references in general and in the scenarios specifically. There probably has been a mishap in numbering.
Response: We apologize for the inconvenience this has created to the reviewers. We will check out Table references throughout the manuscript and make the required changes in the text.
C1.10: L508: Any advice on how to address the fear of being scooped? Since you mention this worry already in the abstract it would be good to address this in the main article as well.
Response: We thank the community reviewer for pointing out that the fear of being scooped was not mentioned in the main body of the article. It is certainly one of the factors that prevent people from openly sharing their work. We will modify the sentence in L517 to:.
“One point to address can be highlighting the potential long-term impact of open hydrology on your career (Allen and Mehler, 2019) and also the fact that early publication in an official repository is a protection against being scooped, as your contribution is then documented with a date attached to it.”
We will also add an additional sentence in the main text, after line 41:
“Another benefit of embracing open science practice is a vastly improved collaboration practice, as intermediate scientific results and ideas are placed in the public domain with clear authorship and date, reducing the potential for being “scooped”, i.e. for seeing your results or ideas published by someone else without proper acknowledgement of the origin (Laine, 2017).”
C1.11: Principle 2: Is "Water Metadata Language" a fixed term? If it is, I do not know it and further explanation and reference would be helpful.
Response: We thank the community reviewer for pointing this out and we have adjusted this to say “...metadata formats following metadata standards based on application and topic, and…”.
Citation: https://doi.org/10.5194/hess-2021-392-AC7
-
AC7: 'Reply on CC1', Caitlyn Hall, 28 Oct 2021
-
RC2: 'Comment on hess-2021-392', Koen Hufkens, 01 Oct 2021
I enjoyed reading this manuscript. The manuscript argues for the use of open science practices in hydrology and does this admirably in a very structured and transparent way. This makes for a helpful document, as are reference for practisioners but also for teaching purpose. It should facilitate discussions surrounding open science both in hydrology but also in other fields (where similar issues exist).
The manuscript is structured along four principles, addressing various stages of research and their potential to be more open. Although there is overlap between them I think this approach works well to address some of the discussions surrounding open science. Further "scenarios" as listed at the end of the manuscript help put some of these principles in context.
As such I'm willing to accept the manuscript as is, as most comments are minor and deal with very nuanced language. Comments listed below should be considered if possible.
General comments:
When looking at the larger picture, it might be very important to note somewhere that open access is not a technical challenge anymore but mostly a socio-cultural one. When looking at table A1 and the challenges discussed only 3 out of 13 are technological, while the remaining are mostly political / socio-cultural. Despite the tools listed in the manuscript, I fear open science practices are not governed by the lack of these tools, the access to them (most of them are free), or even the use of them by some.
Detailed comments:
[Line 83] "Open hydrologists intentionally plan for, describe, and share the entire research process and approach from motivation to the final product"I would be careful about leaning too strongly on either defining output as products or the fact that a pre-defined motivation should be provided in order to keep things open. From a very practical point of view dealing with science output as products is often very efficient, as it sets clear expectations. However, language matters and managerial language creep is sometimes very toxic as it often debases research (favouring short term returns, products, over slow generation of bodies of knowledge). I would suggest to shy away from defining research as products, and use research outputs / research results instead.
[Line 96] I think asking for a reasonable explanation of methods is ok. However, more often than not the answer might come down to a lack of funding. I wonder to what extent such a motivation will be accepted in review, and if this will stigmatize those who have less resources if there is the expectation that you are really explicit about these methodological choices (limited by funding). This dynamic already exists, but at least the expectation doesn't exist that it is written in full.
[Line 135] Requirements for version control shouldn't be grounded in the ability to peruse through previous in silico experiments. It is a tall order to ask people to use version control in the first place, it is another barrier to do this consistently in the way of a lab notebook. I heavily use git and to be honest my commits aren't clean. It is important that people save the state of the software used in particular experiments (either through a release/checkpoint on zenodo, github or similar). Actual commits or even branches probably have less value, and make things difficult if not less transparent. i.e. I would stress the deposition of code in repositories, rather than front loading additional computational skills (which are for some hard to acquire - saving data/code in a repo isn't). In general, I will take dirty code over no code and a static release tied to a manuscript over a dynamic repo (with recent changes).
[Line 159] This section focusses on software documentation, mostly for the end user. However, true open development is often hampered by not only the lack of user end documentation but proper code comments in the software itself. The lack of clear documentation of the code functioning (not the code use) by inline comments is something that is often forgotten and limits code re-use within different contexts. It also limits learning opportunities by seeing how computational problems are solved within a real world situation, not a classroom setting. A line on this could go a long way.
[Line 160] Include a link to ReadTheDocs, people might not be familiar
[Line 179 (and the section below)] I would suggest an open license. Permissive is the perogative of the researcher. It must be acknowledged that permissive licenses, on open software and data, in the past have been abused by industry by offloading tasks to OSS volunteers while not giving back proportionally to the gains made (e.g. recent more restrictive licensing on the openstreetmap data) and this risk is to be assessed by researchers. I think this is echoed by some other reviewers as well. However, the spirit of open science would suggest a permissive license and I value the sentiment.
Citation: https://doi.org/10.5194/hess-2021-392-RC2 -
AC5: 'Reply on RC2', Caitlyn Hall, 28 Oct 2021
We very much thank the reviewer for their comments and suggestions. We found that their comments were thoughtful and helpful to improving our manuscript.
I enjoyed reading this manuscript. The manuscript argues for the use of open science practices in hydrology and does this admirably in a very structured and transparent way. This makes for a helpful document, as are reference for practisioners but also for teaching purpose. It should facilitate discussions surrounding open science both in hydrology but also in other fields (where similar issues exist).
The manuscript is structured along four principles, addressing various stages of research and their potential to be more open. Although there is overlap between them I think this approach works well to address some of the discussions surrounding open science. Further "scenarios" as listed at the end of the manuscript help put some of these principles in context.
As such I'm willing to accept the manuscript as is, as most comments are minor and deal with very nuanced language. Comments listed below should be considered if possible.
General comments:
R2.1: When looking at the larger picture, it might be very important to note somewhere that open access is not a technical challenge anymore but mostly a socio-cultural one. When looking at table A1 and the challenges discussed only 3 out of 13 are technological, while the remaining are mostly political / socio-cultural. Despite the tools listed in the manuscript, I fear open science practices are not governed by the lack of these tools, the access to them (most of them are free), or even the use of them by some.
Response: We thank the reviewer for this observation. We agree that most challenges are not technical in nature. With this paper we want to contribute to removing the sentiment that open science is a technical problem by exemplifying many tools and resources via the practical guide in Section 2. Furthermore, we provide suggestions on how individual scientists can work on removing the remaining challenges. Hopefully we can contribute to overcoming some of the socio-cultural challenges listed. To this end we have added the following to the appendix discussion: “Note that only three out of thirteen challenges are of a technical nature. This shows that the adoption of Open Science is (no longer) a primarily technical challenge.”
Detailed comments:
R2.2: [Line 83] "Open hydrologists intentionally plan for, describe, and share the entire research process and approach from motivation to the final product"
I would be careful about leaning too strongly on either defining output as products or the fact that a pre-defined motivation should be provided in order to keep things open. From a very practical point of view dealing with science output as products is often very efficient, as it sets clear expectations. However, language matters and managerial language creep is sometimes very toxic as it often debases research (favouring short term returns, products, over slow generation of bodies of knowledge). I would suggest to shy away from defining research as products, and use research outputs / research results instead.
Response: We fully agree that thinking of research as products is counter-productive, and may be one of the contributing factors to non open source research being the norm. We willre-phrase this principle from “product” to “output” throughout the manuscript as well as in the figure and the tables.
R2.3: [Line 96] I think asking for a reasonable explanation of methods is ok. However, more often than not the answer might come down to a lack of funding. I wonder to what extent such a motivation will be accepted in review, and if this will stigmatize those who have less resources if there is the expectation that you are really explicit about these methodological choices (limited by funding). This dynamic already exists, but at least the expectation doesn't exist that it is written in full.
Response: We thank the reviewer for pointing this out. However, we believe that making science open is not an activity that comes on top or after “science has been done”, but should be an intrinsic part of the scientific method. This is why we advocate for following the open science approach from the start and throughout the process or to incorporate open science in the stage of research one is currently in (even post-publication). This way, promises in terms of quantity of results per amount of funding may need to be scaled down in favour of improved quality of results, i.e. more reproducible results.
In Line 96, we will clarify that this should be done as is possible.
Further, we will clarify this in the Summary and Outlook, on Line 245, as such:
“Many funding agencies, publishers, and hydrologic organizations are increasingly requiring hydrologists to adopt open science practices, but not all are aware of the additional effort and time needed, as open science practices need to be implemented throughout the process from the project design and budget generation to the final publication and post-publication curation of data.”
R2.4: [Line 135] Requirements for version control shouldn't be grounded in the ability to peruse through previous in silico experiments. It is a tall order to ask people to use version control in the first place, it is another barrier to do this consistently in the way of a lab notebook. I heavily use git and to be honest my commits aren't clean. It is important that people save the state of the software used in particular experiments (either through a release/checkpoint on zenodo, github or similar). Actual commits or even branches probably have less value, and make things difficult if not less transparent. i.e. I would stress the deposition of code in repositories, rather than front loading additional computational skills (which are for some hard to acquire - saving data/code in a repo isn't). In general, I will take dirty code over no code and a static release tied to a manuscript over a dynamic repo (with recent changes).
Response: We fully agree that ensuring the exact version of an experiment used for a publication is available is more important than tracking all intermediate versions. We found that without consistently using version tracking it becomes almost impossible to track exactly what version of the software was used for what experiment. Deposition of code in repositories is mentioned in Principle 4 as well, but we agree some additional emphasis will be helpful. We will add the following sentence to address this aspect:
Following Line 135, we will add the following: “Even more important than version tracking is depositing the code used for each publication in a repository such as zenodo for safe keeping, and this is explained in Principle 4”.
R2.5: [Line 159] This section focusses on software documentation, mostly for the end user. However, true open development is often hampered by not only the lack of user end documentation but proper code comments in the software itself. The lack of clear documentation of the code functioning (not the code use) by inline comments is something that is often forgotten and limits code re-use within different contexts. It also limits learning opportunities by seeing how computational problems are solved within a real world situation, not a classroom setting. A line on this could go a long way.
Response: Thank you for this observation. We agree that comments in the software itself are very important. We will add some clarifications in the text that documentation should also be for developers, not only users. We will add additional text on the importance of inline and developer documentation of code.
R2.6: [Line 160] Include a link to ReadTheDocs, people might not be familiar
Response: With the many technical terms used in the section we fear that addings links to all would clutter the paper too much. But we agree that some of the less common ones could use a link rather than letting the reader search the Internet for the term. We will be incorporating links in our new table, as suggested by the first reviewer within each section of the summary table that includes tips, tools, and resources that complement the practical guide.
R2.7: [Line 179 (and the section below)] I would suggest an open license. Permissive is the perogative of the researcher. It must be acknowledged that permissive licenses, on open software and data, in the past have been abused by industry by offloading tasks to OSS volunteers while not giving back proportionally to the gains made (e.g. recent more restrictive licensing on the openstreetmap data) and this risk is to be assessed by researchers. I think this is echoed by some other reviewers as well. However, the spirit of open science would suggest a permissive license and I value the sentiment.
Response: This is a good point, and we agree with the reviewer that it is a philosophical question what open license best represents the interests of the stakeholders. We will replace “permissive” by “open” in the text as suggested, while keeping the qualification “that allows editing and sharing derivative works with both scientists and the general public” as it was.
Citation: https://doi.org/10.5194/hess-2021-392-AC5
-
AC5: 'Reply on RC2', Caitlyn Hall, 28 Oct 2021
-
RC3: 'Comment on hess-2021-392', Anonymous Referee #3, 08 Oct 2021
I really enjoyed this paper. I think it could be a handy product to help guide groups working in hydrology that might not be familiar with all the cutting edge open science tools and general open access principles. I like that it is generally written as "here's what we're trying to do...sometimes that's not possible". The language when that's not the case concerns version control. As a researcher, I do heavily rely on it, but I might tone down the language from "critical" to "desireable" or something like that. The text mentions Travis CI with github, I would consider removing that and replace it with GitHub Actions.
As a reviewer, it was a challenge to find principle #2...but presumably that will be fixed by the final production.
The authors mention analysis pipelines several times, I was a little surprised to find no mention for some of the "make" based tools (make, targets, probably others). Not critical to add, but might be worth considering.
I liked the senerios! All in all, I'm in approval of having this paper published. The authors did a great job. I generally agreed with many of the smaller comments made by the other reviewers in the open forum, but I don't think anything rises to a "major" or required revision from my perspective. Great job!
Citation: https://doi.org/10.5194/hess-2021-392-RC3 -
AC6: 'Reply on RC3', Caitlyn Hall, 28 Oct 2021
We are thankful for the reviewer’s supportive comments on the manuscript’s scope and objective. We sincerely hope that our manuscript, which is the outcome of an inspiring collaboration between early and mid-career hydrologists actively involved in science and research, will motivate and empower many others to engage in open hydrology.
R3.1: I really enjoyed this paper. I think it could be a handy product to help guide groups working in hydrology that might not be familiar with all the cutting edge open science tools and general open access principles. I like that it is generally written as "here's what we're trying to do...sometimes that's not possible". The language when that's not the case concerns version control. As a researcher, I do heavily rely on it, but I might tone down the language from "critical" to "desireable" or something like that. The text mentions Travis CI with github, I would consider removing that and replace it with GitHub Actions.
Response: Thank you to the reviewer for their reflection on this and we will change critical to “ideal”. Furthermore, we will add GitHub Actions to the list with Travis CI with Github.
R3.2: As a reviewer, it was a challenge to find principle #2...but presumably that will be fixed by the final production.
Response: We have included it as a comment to the original submission, and it will be incorporated into our current version that reflects all reviewer comments.
R3.3: The authors mention analysis pipelines several times, I was a little surprised to find no mention for some of the "make" based tools (make, targets, probably others). Not critical to add, but might be worth considering.
Response: We agree that these are important aspects to discuss and we will add in a short discussion about tools to help make analysis more easy and reproducible by automating multiple steps of a workflow.
I liked the senerios! All in all, I'm in approval of having this paper published. The authors did a great job. I generally agreed with many of the smaller comments made by the other reviewers in the open forum, but I don't think anything rises to a "major" or required revision from my perspective. Great job!
Citation: https://doi.org/10.5194/hess-2021-392-AC6
-
AC6: 'Reply on RC3', Caitlyn Hall, 28 Oct 2021